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Abstract 

The  thesis  presents  an  approach  to  the  problem  of  global  register  allocation  as 
performed  by  an  optimizing  compiler.  The  problem  considered  is  actually  the  more 
general  one  of  choosing  what  physical  resource  within  the  target  machine  will  be 
used  to  hold  the  results  of  various  computations  in  a running  program.  The  results 
may  be  the  values  of  common  (redundant)  subexpressions,  partial  resu'ts  developed 
during  evpression  evaluation,  or  variables  declared  by  the  programmer.  An 
optimizing  compiler  can  make  better  use  of  the  resources  of  the  target  machine  if 
these  decisions  are  all  considered  together  at  or  near  the  source  level  rather  than 
being  distributed  throughout  the  compiler  and  operating  at  both  source  and  object 

levels. 

A decomposition  of  an  optimizing  compiler  is  presented  with  research  focusing 
on  one  part  of  the  compiler,  namely  the  part  which  assigns  the  computed  results  to 
physical  locations.  The  entities  for  which  locations  must  be  assigned  by  the  compiler 
are  uniquely  identified  by  temporary  names  (TNs).  The  process  of  binding  the  TNs  to 
actual  locations  is  called  TNBtND.  A further  decomposition  of  the  TNBIND  model  yields 
several  interesting  problems.  Three  of  these  problems  are  considered  in  greater 

detail. 

(1)  Specifying  the  way  in  which  particular  language  constructs  interact  with 
particular  target  machine  capabilities. 

(2)  Determining  the  lifetimes  of  TNs,  i.e.,  the  segments  of  the  program  during 
which  each  TN  contains  a valid  value.  This  is  similar  to  what  has  been  called  live- 

dead  analysis. 

(3)  Assigning  a large  number  of  TNs  to  the  relatively  few  physical  locations 
available.  This  is  related  to  so-called  "knapsack"  or  "cutting-stock"  problems  in 
operations  research. 

Several  versions  of  the  TNBIND  model  are  incorporated  into  the  Bliss-11 
compiler  and  compared  with  each  other  and  with  the  original  compiler  in  terms  of 
code  quality  and  compilation  time. 
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Chapter  1 
Introduction 


This  thesis  presents  an  approach  to  the  problem  of  global  register  allocation  as 
performed  by  an  optimizing  compiler.  The  problem  is  actually  more  general  than 
merely  choosing  one  register  or  another  to  hold  a particular  value.  A compiler  has 
the  freedom  and  responsibility  to  choose  what  physical  resource  will  be  used  to 
(temporarily)  store  information  during  the  execution  of  a program.  This  is  a freedom 
given  up  by  a programmer  when  he  changes  from  an  assembler  to  a compiler.  When 
a program  is  being  executed,  many  values  are  computed,  stored  temporarily,  used  and 
then  discarded.  At  the  time  of  compilation  a compiler  must  decide  which  values  will 
be  stored  in  which  locations  at  runtime.  The  thesis  does  not  include  questions  of 
dynamic  storage  allocation  or  other  questions  of  large  quantities  of  storage,  but  is 
restricted  to  consideration  of  problems  usually  associated  with  register  allocation. 

In  most  computers. there  are  several  classes  of  locations  which  may  be  used  to 
hold  information  (e.g.  registers,  main  memory,  stack).  In  addition  there  are  usually  a 
number  of  (logically)  identical  locations  within  each  class.  Because  the  primary 
function  of  a compiler  is  to  translate  faithfully  any  algorithm  presented  to  it,  the 
primary  goal  must  be  to  make  a consistent  s'  ‘ of  decisions  about  where  information 
will  be  stored,  i.e.  the  translation  must  perform  the  algorithm  described  by  the  source 
program.  This  thesis  is  concerned  with  the  secondary  goal  of  an  optimizing  compiler, 
namely  to  select  from  the  possible  translations  one  that  is  somehow  "better"  than  the 
others.  The  concern  here  is  only  with  the  decisions  relating  to  the  use  of  temporary 
storage.  Other  areas  of  optimization,  especial.y  common  subexpression  elimination 
and  code  motion  will  be  discussed  only  as  needed  to  provide  context  and  motivation 
for  the  discussion  of  temporary  storage. 
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1.1  Background 

For  many  purposes  the  advantages  gained  by  using  an  optimizing  compiler  are 
far  outweighed  by  the  cost  of  analyzing  the  alternative  code  sequences  in  such  a 
compiler.  In  fact  Hansen  [Han74]  has  suggested  that  some  types  of  programs  are 
more  naturally  run  in  the  environment  of  an  adaptive  compiler  that  initially  interprets 
programs  and  only  compiles  or  optimizes  as  execution  frequency  warrants.  On  the 
other  hand,  some  applications  demand  highly  efficient  programs.  In  this  latter 
category  come  operating  systems,  compilers,  and  other  programs  run  constantly  or 
very  frequently  but  which  are  compiled  infrequently.  Traditionally  such  programs 
have  been  written  in  assembly  languages.  By  programming  at  the  machine  instruction 
level  the  programmer  is  able  to  use  all  of  his  experience  and  Knowledge  of  both  the 
machine  and  the  problem  to  make  decisions  about  what  code  should  be  written  to 
implement  various  parts  of  his  algorithm.  Until  we  can  accurately  build  a model  of 
human  knowledge,  reasoning  and  associative  capacity  into  a compiler,  there  will 
continue  to  be  small  programs  or  parts  of  programs  on  which  an  experienced 
assembly  language  programmer  can  outperform  any  compiler  (In  terms  of  "goodness 
of  output  code). 

The  problem  with  the  traditional  approach  is  twofold.  1)  Assembly  language 
programs  do  not  lend  themselves  to  understandability,  modifiability  and  demonstration 
of  correctness  as  many  higher  level  language  programs  do.  2)  Tnere  are  not  enough 
good  assembly  language  programmers.  An  optimizing  compiler  is  an  attempt  to  solve 
this  problem.  The  advantages  of  higher  level  languages  (specifically 

understandability,  modifiability,  and  demonstration  of  correctness)  are  widely 
accepted  even  by  those  who  are  not  satisfied  with  the  efficiency  of  the  code 
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produced.  The  hope  is  that  programs  can  be  written  in  a high  level  language,  and 
that  an  optimizing  compiler  can  provide  translations  which  execute  with  the  efficiency 
of  good  assembly  language  programs. 

This  thesis  will  consider  only  that  part  of  an  optimizing  compiler  generally 
Known  as  the  register  allocator.  The  problem  considered  is  actually  the  more  general 
one  of  deciding  where  information  may  be  temporarily  stored  at  program  execution 
time;  the  locations  chosen  need  not  be  registers.  The  idea  is  to  make  optimal 
decisions,  but  optimality  is  both  hard  to  define  and  hard  to  measure.  In  terms  of 
register  allocation,  optimality  may  mean  minimum  object  code  size,  minimum  execution 
time,  minimum  number  of  registers  used,  minimum  number  of  loads  and  stores,  some 
combination  of  these,  or  some  other  criteria.  The  word  optimal  was  purposely 
omitted  from  the  title  of  the  thesis  because  there  can  be  no  optimal  solution  to  the 
register  allocation  problem  without  a good  deal  of  qualification  on  the  meaning  and 
measurements  of  optinnlity.  The  emphasis  here  is  on  providing  a general  framework 
in  which  particular  optimization  decisions  car  easily  be  made. 

Although  algorithms  for  specific  aspects  of  the  problem  have  been  presented, 

there  has  been  little  or  no  consideration  of  the  overall  problem,  A primary  goal  of 

•1 

this  thesis  is  to  present  a model  of  register  allocation  which  shows  the  interaction  of 
the  various  subproblems  and  allows  specific  solutions  to  those  subproblems  to  be 
easily  incorporated  into  a realization  of  that  model. 

In  the  long  term  this  research  is  directed  toward  compiler  compilers  -- 
programs  which  produce  compilers  from  a description  of  the  source  language  and  the 
target  machine.  Just  as  formalization  of  language  syntax  led  to  advancements  in 
mechanically  generated  parsers,  it  is  expected  that  the  formalization  of  the  register 
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allocation  problem  vill  lead  to  advancements  In  this  area  of  mechanically  generated 
optimizers. 

1,2  Issues  and  Subproblems 

There  are  many  issues  in  the  overall  register  allocation  problem.  Most  of  these 
issues  arise  because  we  want  to  produce  code  that  is  not  only  correct,  but  also 
"good"-,  many  of  the  issues  become  "non-issues"  if  we  are  concerned  only  with  the 
generation  of  correct  code.  The  issues  include  at  least  these: 

Evaluation  order 
Target  Path 
Machine  Requirements 
Run  time  environment 
Interaction  with  control  flow 

Each  of  these  is  discussed  in  more  detail  below. 

1.2.1  Evaluation  order 

Since  an  expression  must  be  evaluated  by  first  evaluating  its  subexpressions 
and  then  combining  them,  some  decision  must  be  made  about  the  order  in  which  the 
subexpressions  will  be  evaluated  and  where  the  resulting  values  will  be  stored  until 
needed.  A simple  example  will  help  illustrate  the  opl,ons.  Suppose  we  are  required 
to  evaluate  the  expression 
(a  * b)  + (c  * d) 

Using  a three  address  notation,  our  evaluation  order  choices  are 

a * b -»  tj 
c * d -»  t2 
ti+t2  -+  t3 

c * d -*  tj 


and 
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a * b -*  t2 
ti+t2  t3 

The  point  is  that  the  evaluation  of  the  expression  requires  that  the  two 
products  be  formed  befire  their  sum  can  be  formed.  Mathematically,  there  is  no 
restriction  placed  on  the  relative  order  in  which  the  products  are  formed.  Indeed,  if 
our  hardware  permits,  the  two  products  could  be  produced  In  parallel.  In  practice, 
however,  the  evaluation  order  may  be  somewhat  restricted.  The  restrictions  may  be 
due  to  the  language  definition,  e.g.  the  language  might  specify  strict  left  to  right 
evaluation  of  subexpressions.  The  evaluation  order  may  also  be  restricted  because 
of  possible  side-effacts  of  an  operend.  The  issue  of  evaluation  order  is  how  to 
decide  which  subexpression  to  evaluate  first.  In  the  example  of  Section  1.2.1  there  is 
no  obvious  preference  (Or  choosing  one  or  the  other  subexpression  as  the  one  to  be 
evaluated  first.  In  some  ;ases  choosing  the  correct  evaluation  order  is  the 
difference  between  "good"  and  "bad"  code. 

1.2.2  Target  path 

Although  similar  to  evaluation  order  determination,  the  selection  of  target  path 
is  in  fact  an  independent  decision.  In  the  example  above  we  avoided  the  target  path 
decision  by  using  a three  address  notation.  Very  'ew  computers  have  full  three 
address  capability,  The  last  of  the  three  address  instructions  above 

ti+t2  -*  <3 

may  have  two  realizations  on  most  machines 

h H *3 

t2+t3  -»  t3 

t2  -»  t3 
h+<3  -*1  l3 


or 


6 


An  Approach  to  Globa!  Register  Allocation 


The  target  path  is  the  sequence  of  subnodes  which  are  first  loaded  Into  result 
locations  and  then  operated  on  by  the  indicated  operators.  We  refer  to  an  operand 
lying  on  the  target  path  as  the  target  path  operand,  although  we  frequently  refer  to 
an  operand  as  being  the  target  path  since  at  any  particular  node  the  target  path  Is 
uniquely  identified  by  an  operand.  When  the  alternatives  are  made  explicit,  we  see 
that  the  initial  move  instruction  (tj  -*  tg  or  tg  -»  tg)  may  be  eliminated  if  it  is  possible 
that  tg  can  be  the  same  location  as  either  tj  or  tg.  In  the  simple  case,  tg  may  be 
assigned  to  either  location  and  we  may  make  an  arbitrary  choice.  In  other  cases 
more  global  context  may  restrict  our  decisions.  In  the  above  example,  let  us  suppose 
that  tj  (i.e.  a*b)  is  a redundant  subexpression  whose  value,  once  computed,  may  be 
used  several  times  without  recomputation.  In  this  case  tg  must  not  be  the  same 
location  as  t^  since  the  value  in  tj  must  be  preserved  for  later  use.  This  additional 
information  leads  us  to  choose  tg  as  the  target  path. 

1.2.3  Machine  requirements 

Different  computers  place  differing  requirements  on  the  operands  of  certain 
instructions.  Aside  from  the  special  instructions  unique  to  a given  machine,  these 
requirements  are  usually  restrictions  on  the  kind  of  location  in  which  the  operands  of 
the  normal  unary  and  binary  operators  may  reside.  The  available  kinds  of  locations 
may  include 

main  memory 
register 
top  of  stack 

special  register  (e.g,  floating  point,  index) 
pair  of  registers 

Any  general  model  of  the  temporary  storage  management  problem  must  provide  for 
these  dirfering  requirements. 
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1.2.4  Run  time  environment 


Language  designers/implementors  make  decisions  about  the  run  time 

environment  which  must  be  considered  by  a general  model  of  temporary  storage 

management.  These  decisions  are  essentially  extensions  of  the  machine 

requirements.  Some  of  the  questions  to  be  addressed  here  are 

How  many  registers  are  available  for  system  temporaries? 

Which  registers  are  safe  and  which  are  destroyed  across  subroutine 
calls? 

Are  there  display  pointers?  If  so  lire  they  fixed  or  dynamically 
allocated? 

How  are  parameters  to,  and  return  values  from,  subroutines  l.andled? 

What  is  the  interaction  with  library  and  system  functions  including 
debuggers? 

The  model  must  allow  for  some  degree  of  freedom  in  making  these  decisions. 


1.2.5  User  variables 


Should  variables  declared  and  used  by  the  programmer  to  temporarily  hold 
values  be  given  the  same  treatment  as  those  generated  by  the  compiler?  Program 
size  and  execution  time  car.  be  reduced  if  some  user  variables  can  reside  in 
registers.  If  we  are  to  consider  allocating  registers  for  user  variables,  it  is  logical 
that  we  do  that  at  the  same  time  we  allocate  registers  for  compiler  temporaries. 
While  machine  restrictions  and  the  run  time  environment  force  some  decisions  about 
which  values  must  or  must  not  be  in  registers,  we  may  be  able  to  produce  a "better" 
program  if  we  place  some  user  variables  in  registers,  even  at  the  expense  of  keeping 
some  compiler  temporaries  elsewhere. 
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1,2.6  Interaction  with  control  flow 

In  order  to  generate  "good"  code  we  must  make  the  best  use  of  the  resources 
available.  In  the  case  of  registers,  generally  a scarce  resource,  this  Implies  that  we 
must  be  able  to  recognize  precisely  when  a register  contains  valid  information  and 
when  it  does  not.  We  say  that  a location  contains  a valid  value  when  the  value  may 
be  referenced  before  a new  value  is  stored  into  the  location.  We  can  do  this  In  much 
the  same  way  a clever  assembly  language  programmer  does.  When  such  a 
programmer  needs  a register  for  some  value,  he  follows  the  control  flow  to 
determine  where  the  registers  currently  in  use  are  referenced  and  where  values  are 
stored  into  them,  A register  can  contain  several  logically  distinct  quantities  as  long 
as  no  more  than  one  of  the  quantities  has  a valid  value  at  any  point  in  the  program. 
This  issue  concerns  the  graph  theoretic  properties  of  the  program.  Suppose  that 
points  x and  y in  the  program  flow  graph  are  uses  of  some  variable  A,  and  that  at 
point  z a new  value  is  stored  into  A.  We  want  to  ask  questions  such  as  "Is  there  a 
path  from  x to  y that  does  not  pass  through  z?"  If  the  answer  is  no,  then  the  variable 
A does  not  contain  a valid  value  between  points  x and  z. 

Another  aspect  of  this  issue  is  the  use  of  a register  over  a small  piece  of 
program  to  hold  a variable  whose  value  is  normally  kept  in  memory.  When  a variable 
is  accessed  frequently  within  a small  segment  of  code,  program  performance  may  be 
improved  by  loading  that  variable  into  a register  before  the  segment,  thus  eliminating 
a memory  reference  for  each  access,  and  storing  it  back  into  memory  after  the  code 
segment.  This  optimization  is  most  commonly  applied  to  loops  because  the  savings 
are  multiplied  by  the  number  of  times  the  loop  is  executed. 
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1.3  Previous  work 

This  section  presents  some  of  the  major  results  in  the  area  of  register 
allocation.  Published  work  in  the  area  has  generally  been  limited  to  consideration  of 
simple  allocation  problems  in  straight  line  programs  and  optimal  evaluation  order  for 
expressions.  Little  published  material  is  available  on  the  problems  of  register 
allocation  in  the  presence  of  control  flow  constructs.  The  material  below  summarizes 
the  major  results. 

1.3.1  Index  register  allocation 

Horv/itz,  et.  al.  [Hor66]  discussed  index  register  allocation  in  a paper  oriented 
toward  FORTRAN-like  programs  (and  machines  like  the  IBM  7090)  which  have  simple 
array  accessing  mechanisms.  An  index  is  presumed  to  be  a simple  variable  whose 
value  must  be  retained  either  in  a register  or  in  memory  at  all  times.  Given  the  future 
index  requirements  of  a program,  the  allocation  of  the  index  registers  of  the  machine 
to  the  indices  is  considered.  When  all  of  the  index  registers  contain  values  that  will 
be  needed  again  later  in  the  program,  a decision  must  be  made  to  replace  one  of 
those  values  when  a new  index  is  required. 

Horwitz  considers  the  possibility  that  an  index  may  be  changed  while  it  resides 
in  a register.  If  an  index  is  changed  in  a register,  and  subsequently  that  register 
must  be  allocated  to  another  index,  the  changed  value  must  be  stored  in  memory.  If 
the  value  is  not  changed,  it  is  not  necessary  to  store  the  value  back  into  memory 
when  the  register  is  reallocated.  This  problem  is  analogous  to  the  problem  of  page 
replacement  in  a virtual  memory  system.  It  is  less  expensive  to  replace  a page  which 
has  not  been  changed  since  it  was  read  from  secondary  storage  because  a valid  copy 
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still  exists  elsewhere. 
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For  the  purpose  of  this  problem,  a program  can  be  considered  to  be  a 
sequence  of  steps  each  of  which  requires  a specific  index.  The  fact  that  there  may 
be  steps  in  the  program  that  do  not  require  indices  is  not  important.  Consider  the  set 
of  program  steps  and  associated  indices 


step  index 


where  x*  means  that  index  x is  changed  in  the  step  where  x*  appears.  When  a step 
calls  for  an  index,  that  index  must  be  in  one  of  the  index  registers.  The  other  index 
registers  may  contain  any  configuration  of  indices.  The  indices  in  the  other  index 
registers  may  or  may  not  be  in  a modified  state. 

We  may  construct  Jl  of  the  allowable  configurations  for  each  step  i,  l.e.  all 
combinations  of  n of  the  indices  used  by  the.  program  which  include  the  index 
required  by  step  i (where  n is  the  number  of  index  registers  available).  Consider  the 
configurations  to  be  nodes  in  a directed  graph  with  branches  from  each  configuration 
of  the  tth  step  to  each  configuration  of  the  i+lst  step.  Each  of  these  branches  can 
be  assigned  a weight  which  is  the  cost  of  making  the  change  in  configurations 
between  steps  i and  4+1  represented  by  the  branch.  The  cost  of  changing  between 
configurations  is  defined  as  the  number  of  memory  references  required  to  make  the 
change.  Thus  each  new  index  which  is  loaded  has  a cost  of  one.  Each  starred  index 
which  is  replaced  has  an  additional  cost  of  one.  Changir.j  an  unstarred  occurrence  of 
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an  index  to  a starred  occurrence  of  the  same  index,  or  replacing  an  unstarred  index 
require  no  memory  references  and  therefore  have  a cost  of  zero. 

Given  this  representation  of  the  poss  ble  allocation  of  index  registers,  fhe 
problem  becomes  one  of  finding  the  shortest,  i.e.  least  expensive,  path  through  the 
graph  from  the  first  step  to  the  last  step.  Although  there  are  several  algorithms  for 
finding  the  shortest  path  through  a directed  graph,  the  number  of  calculations 
required  for  other  than  a small  number  of  nodes  makes  these  solutions  impractical. 
Since  it  is  necessary  to  find  only  one  of  the  possibly  many  shortest  paths  through 
the  „raph,  we  may  restrict  attention  to  any  subgraph  which  contains  a shortest  path. 
The  bulk  of  the  Horwitz  paper  is  devoted  to  developing  properties  of  these  graphs 
which  lead  to  rules  for  eliminating  nodes  and  branches  from  consideration.  Horwitz 
proves  that  the  subgraph  obtained  by  applying  these  rules  does  contain  a shortest 
path,  and  gives  a procedure  for  finding  that  path.  Six  rules  are  given  for  generating 
the  subgraph  from  which  an  optimal  index  register  allocation  may  be  derived.  Define 
w(n1,n2)  to  be  the  cost  of  changing  the  configuration  from  that  of  node  n!  to  that  of 
node  n2.  Define  W(n')  to  be  the  weight  of  a node  given  by  minn(W(n)+w(n,n')>,  i.e.  the 
minimum  over  all  n of  the  sum  of  the  weight  of  n and  the  cost  of  changing  from  n to 
n'.  The  weight  of  the  initial  node  is  zero.  Given  these  definitions  we  may  summarize 
Horwitz’s  rules: 

Rule  1:  Generate  only  minimal  change  branches  and  eliminate  any  node 
which  has  no  minimal  change  branches  entering  it.  A minimal  change 
branch  is  defined  as  a branch  from  node  n at  step  i to  node  n'  at  step 
i+1  such  that  either  nodes  n and  n'  are  identical  or  n'  differs  from  n 
only  in  the  irdex  required  at  step  i+1. 

Rule  2:  If  nj  and  n2  are  nodes  of  step  i and  W(n1)+w(n1,n2)sW(n2), 
eliminate  n2. 
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Figure  1-1.  The  result  of  applying  Horwitz’s  Rule  1. 
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Figure  1-2.  Luccio’s  six  link  types. 
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Rule  3:  and  r)2  are  nodes  at  step  i which  differ  in  exactly  one  element. 

Let  zj  be  the  element  of  n^  which  is  replaced  by  Z2  in  n2.  Although 
the  exact  explanation  is  somewhat  more  complex,  the  idea  is  that 
node  n2  can  be  eliminated  when  W(nj)<W(n2>  and  in  the  future  zj  will 
be  used  before  Z2.  This  requires  the  ability  to  look  ahead  in  the 
program. 

Rile  A:  This  rule  is  a consequence  of  Rule  3 and  prevents  generation  of 
nodes  that  would  later  be  eliminated  by  Rule  3.  If  z^  and  Z2  are 
elements  of  a node  n at  step  i and  the  next  use  of  zj  comes  before 
the  next  use  of  Z2,  do  not  form  a node  at  step  t+1  which  replaces  Zj 
by  the  index  required  at  step  i+1. 

Rule  5:  Since  we  need  only  one  shortest  path,  generate  only  one  branch 
b into  each  node  a'  such  that  W(n')“W(n)+w(6). 

Rule  6:  If  a node  n of  step  i which  is  not  the  last  step  has  no  branches 
leaving  it,  eliminate  node  n. 

Figure  1-1  (reproduced  from  [Hor66])  shows  the  result  of  applying  Rule  1 to 
the  graph  of  the  example  program  above  when  there  are  two  index  registers 
available.  Step  0 is  added  to  indicate  the  initial  configuration  which  contains  two 
indices  not  used  in  the  program  (xg  and  Xg).  Each  branch  is  labeled  with  the  cost  of 
the  change  between  the  indicated  configurations  and  each  configuration  is  labeled 
with  the  minimum  cost  to  reach  the  configuration  from  step  0. 

Luccio  [Luc67]  showed  that  Horwitz’s  rules  may  restrict  the  graph  so  that  at 
some  steps  only  one  configuration  is  possible.  The  program  steps  before  and  after 
such  a step  may  be  treated  separately.  Luccio  neatly  describes  his  technique  in 
terms  of  link  diagrams.  Six  types  of  links  are  used  to  connect  various  combinations 
of  starred  and  unstarred  indices  (Figure  1-2  . Links  of  types  1,  2,  3,  and  A are  built 
whenever  a second  occurrence  of  an  index  is  seen.  Links  of  types  5 and  6 are  built 
following  occurrences  oi  starred  indices  and  are  maintained  up  to  the  current  step. 
These  are  called  temporary  links  since  they  will  be  changed  to  one  of  the  other 
types  when  a succeeding  occurrence  of  the  particular  index  is  encountered. 
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A link  is  said  to  cover  all  the  steps  along  :ts  extension  excluding  the  extremes. 
Only  the  first  extreme  is  excluded  for  temporary  links  so  that  they  cover  the  current 
step.  Luccio  gives  two  rules  for  changing  links  of  types  1,  2,  3,  or  4 to  links  of 
definite  allocation  (type  0).  The  index  corresponding  to  a link  of  type  0 must  be  kept 
in  its  register  throughout  the  entire  extension  of  the  link. 

If  there  are  N reg  ■ *ers  available  then 

1.  A link  L of  type  1 becomes  type  0 if  for  each  step  n covered  by  L the 
number  of  other  links  of  types  0,  1,  2,  3,  or  4 covering  k is  less  than 
N-l. 

2 A link  l of  type  2,  3,  or  4 becomes  type  0 if  for  each  step  k covered 
by  l the  total  number  of  other  links  covering  k is  less  than  N-l. 

When  the  number  of  type  0 links  covering  a step  k is  N-l,  the  configuration  for 
k is  fixed.  The  a registers  must  contain  the  N-l  indices  corresponding  to  the  type  0 
links  and  the  index  required  by  step  k.  At  such  steps  the  Horwitz  method  may  be 
applied  independently  to  the  preceding  and  succeeding  steps. 

The  Horwitz  method  is  related  to  Belady’s  algorithm  for  page  replacement  in  a 
virtual  storage  computer  [Bel66].  Beiady  shOvved  that  in  a paging  environment,  the 
page  to  be  replaced  should  be  the  page  whose  next  use  is  farthest  in  the  future.  In 
addition  he  noted  that  if  a page  has  not  been  written  into,  it  need  not  be  written  out 
(to  secondary  storage)  but  merely  deleted.  The  ability  to  determine  which  page 
(register)  is  next  used  farthest  in  the  future  depends  on  knowing  the  future  behavior 
of  a program. 
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Introduction 
1 .3.2  Evaluation  order 

Ikuo  Nakata  addressed  the  question  of  evaluation  order  in  his  paper  describing 

the  register  allocation  phase  of  a FORTRAN  compiler  for  the  HITAC-5020  [Nak67], 

Nakata  shows  that  the  order  of  evaluation  of  the  subexpressions  of  an  expression 

can  affect  the  number  of  temporary  values  that  are  required  at  any  one  time. 

Consider  the  expression  a*b+(c+d)/(e+f).  A straight  forward  code  sequence  to 

evaluate  this  expression  is: 

a * b •+  Rj 
c + d -*  R2 
e + f Rg 
R2/R3->  R2 

Suppose,  however,  that  this  expression  must  be  evaluated  on  a computer  with  fewer 
than  three  registers.  To  use  the  same  evaluation  order  with  only  two  registers 
available  would  require  that  one  of  the  intermediate  results  (namely  a*b)  be  stored  in 
some  temporary  memory  location,  On  the  other  hand,  by  changing  the  order  of 
evaluation  of  the  subexpressions,  the  expression  may  be  evaluated  using  only  two 
registers  and  without  storing  intermediate  results. 

c + d -»  Rj 
e + f R2 
R1/R2-+  Ri 
a * b ->  R2 
R2+Ri P ■> 

The  central  point  of  this  example  is  that  the  subexpression  (c+d)/(e+f)  requires 
two  intermediate  values.  Since  those  intermediate  results  are  not  needed  after  the 
division  is  pei  formed,  one  of  the  registers  may  be  used  to  compute  a*b.  Since  the 
result  of  the  evaluation  of  an  expression  occupies  only  one  register,  it  follows  that 
for  any  binary  operator,  the  operand  whose  evaluation  requires  the  larger  number  of 
registers  should  be  evaluated  first. 
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The  niK*1  er  of  registers  required  to  evaluate  the  expression  (a)  <op>  (b)  where 
(a)  and  (b)  are  arbitrary  expressions  and  <op>  is  some  oinary  operator  is  given  by 
the  following  analysis,  Let  L and  m be  the  number  of  registers  required  to  evaluate 
(a)  and  (o)  respectively.  If  either  (a)  or  (b)  contains  no  operators  (It  Is  a constant  or 
a simple  variable)  it  requires  zero  registers.  (Note  that  "require"  here  means  the 
minimum  number  of  registers  necessary  to  evaluate  an  expression  without  storing  any 
intermediate  results). 

There  are  two  cases: 

1.  (a)  can  be  evaluated  first  leaving  the  result  in  one  of  the  L 
registers  used.  Evaluation  of  (b)  will  require  one  more  than  the  p-1 
registers  remaining  giving  a total  of  p+1  registers  for  the  expression. 

2.  l+m\  max(/,m)“p.  In  this  case  the  operand  requiring  the  larger  number 
of  registers  is  evaluated  first  leaving  p-1  registers  for  the  other 
operand.  Since  the  other  operand  requires  at  most  p-1  registers  no 
additional  registers  are  needed  and  the  expression  can  be  evaluated 
using  only  p registers. 

In  both  cases  p is  a lower  bound  on  the  number  of  registers  required  and  p+1 
is  an  upper  bound.  In  case  1 p+1  is  a lower  bound,  and  in  case  2 p Is  an  upper 
bound. 

Nakata  gives  an  algorithm  for  labeling  the  nodes  of  a tree  with  the  number  of 
registers  required  for  evaluation  of  the  node,  Briefly,  this  algorithm  assigns  a label 
La  to  each  node  n of  the  tree  such  that  if  n is  leaf  then  Ln_0,  otherwise  the 
immediate  descendants  of  n have  labels  t and  r and  Ln*min(max((+1,  r),  mnx(f,  r+1)). 


Nakata’s  algorithm  for  code  production  involves  first  labeling  the  nodes  of  the 
tree  by  the  above  method,  and  then  beginning  at  the  root  node,  walking  through  the 
tree  generating  code  to  evaluate  the  expression  represented.  At  each  node  the 
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operand  requiring  the  larger  number  of  registers  is  evaluated  first.  If  the  operands 
require  the  same  number  of  registers,  the  left  operand  is  evaluated  first.  Nakata 
does  not  consider  formally  the  question  of  what  to  do  when  the  number  of 
simultaneous  temporary  values  exceeds  the  number  of  registers.  He  does,  however, 
offer  some  heuristics  for  deciding  which  temporary  value  should  be  stored.  On  most 
machines  the  left  operand  of  a division  or  subtraction  operation  must  be  in  a register, 
so  the  left  operand  of  these  operations  should  not  be  stored.  This  may  conflict  with 
the  other  assertion  that  the  value  to  be  stored  should  be  the  one  whose  use  is 
farthest  in  the  future,  but  Nakata  conjectures  that  the  efficiency  of  the  code 
produced  will  not  be  significantly  affected  by  the  choice  of  either  of  these  courses 
of  action. 


Using  a graph  theoretic  approach,  R.  R.  Redziejowski  [Red69]  later  proved  that 
Nakata’s  algorithm  does  use  the  minimum  number  of  registers.  Redziejowski 
transformed  Nakata’s  tree  into  a "lineup"  or  linear  sequence  of  vertices.  Each  vertex 
represents  a single  operation  in  the  tree  and  an  arc  is  drawn  from  vertex  a;  to  vertex 
y to  represent  a partial  result  which  is  computed  at  y and  used  at  *.  Choosing  a 
feasible  evaluation  order  is  equivalent  to  ordering  the  sequence  of  vertices  so  that 
vertex  y precedes  vertex  * if  there  is  an  arc  from  * to  y.  (This  is  equivalent  to 
requiring  that  any  partial  result  be  computed  before  it  is  used.) 

At  any  vertex  x the  number  of  partial  results  created  before  x and  used  after 
x is  represented  by  the  number  of  arcs  passing  over  vertex  x.  Redziejowski  calls 
this  number  the  width  of  the  lineup  and  develops  an  algorithm  for  producing  a lineup 
of  minimum  width.  Redziejowski’s  algorithm  is  in  principle  the  same  as  Nakata’s 
algorithm  and  therefore  Redziejowski’s  proof  of  his  algorithm  can  be  considered  as  a 


18 


An  Approach  to  Global  Register  Allocation 


formal  proof  of  Nakata’s  algorithm.  Redziejowski  generalizes  the  algorithm  to  include 
operators  with  more  than  two  operands. 

Sethi  and  Ullman  [Set70]  consider  the  more  general  problem  of  minimizing  the 
number  of  program  steps  and/or  the  number  of  storage  references  in  the  evaluation 
of  an  expression  with  a fixed  number  of  general  registers.  They  exploit  the 
associative  and  commutative  properties  of  operators  and  assume  that  all  elements  are 
distinct  (no  common  subexpressions)  and  that  there  are  no  non-trivial  relations 
between  operators  (e.g.  no  distributive  law). 

Nakata’s  tree  labeling  scheme  is  modified  slightly  to  account  for  commutative 
and  non-commutative  operators.  This  change  assigns  a label  of  one  rather  than  zero 
to  a leaf  node  which  is  the  left  descendant  of  its  ancestor.  The  change  means  that 
the  left  and  right  operands  of  a binary  operator  may  have  different  weights  and 
accounts  for  the  gains  which  may  be  made  by  exploiting  commutativity. 

First  considering  only  non-commutative  operators,  Sethi  and  Ullman  prove  that 
their  Algorithm  1 (which  is  essentially  Nakata’s  algorithm)  uses  the  minimum  number  of 
registers  as  well  as  the  minimum  number  of  loads  and  stores.  Since  the  number  of 
binary  operators  is  not  changed  by  the  allowed  transformations,  a program  which  has 
a minimum  number  of  loads  and  stores  has  a minimum  number  of  program  steps. 

In  Algorithm  2,  Sethi  and  Ullman  consider  commutative  operators  by  adding  a 
step  to  Algorithm  1 which  interchanges  the  left  and  right  descendants  of  a 
commutative  operator  when  the  left  descendant  is  a leaf  and  the  right  descendant  is  a 
non-leaf. 

Associativity  is  treated  only  in  conjunction  with  commutativity  since  in  practice 
most  associative  operators  are  also  commutative.  The  approach  used  by  Sethi  and 
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M"  is  to  make  the  associative-commutative  operators  into  n-ary  operators, 
reorder  the  operands  so  that  the  one  or  two  operands  requiring  the  largest  number 
of  registers  appear  on  the  left,  and  then  change  back  to  binary  operators  associating 
to  the  left.  This  is  conceptually  similar  to  Redziejowski’s  treatment  of  n-ary 
operators. 

Sethi  and  Ullman  prove  that  each  of  their  algorithms  generates  an  evaluation 
sequence  containing  the  minimum  number  of  loads  and  stores  under  the  assumptions 
of  the  algorithm.  They  then  show  that  this  leads  to  the  co1. elusion  that  the  algorithms 
also  minimize  the  number  of  storage  references. 

In  their  conclusion,  Sethi  and  Ullman  Doint  out  that  all  of  their  algorithms  can  be 
performed  in  time  proportional  to  the  numbtr  of  nodes  in  the  tree.  They  also  show 
that  the  algorithms  can  easily  be  modified  to  allow  operations  which  require  more 
than  one  register. 

Beatty  [Bea72]  recasts  the  ideas  of  Sethi  and  Ullman  in  terms  of  axiom  systems. 
Beatty  extends  the  Sethi-Ullman  algorithm  for  associative-commutative  operators  to 
include  the  unary  minus  and  its  relations  to  the  other  operators.  These  relations 
include  the  equalities 

a-b  - a+(-b) 

-<a*b)  ■-  (-a)*b 
-<a/b)  - (-a)/b  - a/(-b) 

Beatty’s  proof  of  minimality  is  considerably  more  complicated  than  the  Sethi-Ullman 
proof  due  to  the  properties  of  the  unary  minus, 

More  recently  Bruno  and  Sethi  [Bru74,  Set75]  have  shown  shat  the  register 
allocation  problem  for  straight  line  programs  is  polynomial  complete  when  .common 
subexpressions  are  not  recomputed.  The  specific  problems  considered  are  (1)  to  use 
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the  minimum  number  of  registers  without  storing  intermediate  results  and  (?)  to 
generate  the  minimum  length  code  for  a one  register  machine.  While  the  optimal 
register  allocation/evaluation  order  may  be  easily  determined  in  some  cases,  the 
results  of  Oruno  and  Sethi  tell  us  that  in  general  there  is  no  Known  nonenumerative 

solution. 

1,3.3  Global  assignment 

The  work  thus  far  discussed  has  dealt  only  with  the  question  of  optimal  use  of 
registers  in  expressions.  A paper  by  W.  H.  E.  Day  in  the  IBM  Systems  Journal  [Day70] 
considers  the  much  broader  problem  of  global  assignment  of  data  items  to  registers. 
Before  describing  Day’s  work,  it  is  necessary  to  explain  the  distinction  between  what 
Day  calls  global  assignment  and  what  he  considers  local  assignment.  Informally,  a 
local  assignment  is  one  which  makes  assignments  within  basic  blocks,  i.e.  without 
control  flow.  A global  assignment  considers  larger  contexts  which  include  control 
flow.  A more  formal  discussion  follows. 

Consider  a programming  language  L The  terminal  symbols  of  L are  delimiters, 
operators,  constants,  and  identifiers.  The  constants  and  identifiers  are  the  data  items 
of  L.  A program  in  L is  a sequence  of  statements;  a statement  is  a sequence  of 
terminal  symbols.  Statements  in  L are  either  descriptive  or  executable,  the  latter 
specifying  operations  to  be  performed  on  data  items.  A data  item  is  said  to  be 
defined  in  a statement  when  execution  of  the  statement  causes  a new  value  to  be 
assigned  to  the  data  item.  A data  item  is  referrtd  to  when  the  value  of  the  data  item 
is  required  for  correct  statement  execution. 

Let  P be  a program  in  L.  A basic  block  in  P is  an  ordered  subset  of  elements 
of  P which  intuitively  is  "straight  line  code,"  i.e.  a sequence  of  statements  which  can 
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only  be  entered  by  branching  to  the  first  statement  and  which  can  only  be  left  by 
branching  from  the  last  statement.  P ^ is  a representation  of  P as  an  ordered  set  of 
basic  blocks.  P ^ is  a representation  of  P as  a directed  graph  with  the  elements  of  P ^ 
as  the  vertices  and  a set  of  arcs  representing  the  flow  of  control  among  the  basic 
blocks  of  P.  A region  RL  is  a strongly  connected  subgraph  of  Ptf,  and  Pr  is  a 
representation  of  P as  an  ordered  set  of  regions: 

Pr  “ {Pj.^2 Rn) 

m lRi  I Rc  + Rj  for  * * 

Rl  (\Rj-<f>  or  Rl  c Rj  for  i < j, 

Rn  = PB  } 

A computer  has  a set  of  registers  G*  whose  elements  are  g^,  and  for  most 
situations  requiring  the  use  of  a register  any  available  g^  ( G*  may  be  assigned.  Let  d 
represent  an  element  of  P,  P^,  or  Pr  and  define: 

G'  - { gt  ! sL  « G*,  Si  is  available  for  assignment  everywhere  in  d } 

N'  ■ { | nt  is  a data  item  in  P,  rij  may  be  assigned  to  registers  in  d } 

Given  these  representations,  Day  offers  the  following  oefinitions: 

1.  A local  assignment  is  a {possibly  multi-valued)  mapping  of  N c N onto 
G c G'  for  d ( Pfr. 

2.  A global  assignment  is  a (possibly  multi-valued)  mapping  of  N£N 
onto  G c G'  for  d i Pr 

3.  A one-one  assignment  is  a one-one  mapping  of  N £ N'  onto  G £ C\  A 
one-one  assignment  defines  a one-to-one  correspondence  between  N 
and  G. 

4.  A many-few  assignment  is  a single-valued  mapping  of  N£N'  onto 
G £ G'  with  cardinality(N)  £ cardinality(G). 
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5.  A many-one  assignment  is  a many-few  assignment  in  which 
cardinality(G)  = 1. 

A data  item  is  active  at  a point  in  d if  it  may  be  referred  to  before  being 
defined  subsequent  to  that  point.  Two  data  items  interfere  in  d if  they  are  both 
active  at  some  point  in  d.  A necessary  condition  for  the  assignment  of  N c N to  g < 
G in  d is  that  nL  must  not  interfere  with  n.j  for  every  n.^,  ttj  < N,  i t j. 

Local  assignment,  as  defined  by  Dav,  occurs  entirely  within  basic  blocks  of  a 
program.  The  methods  described  by  Horwitz,  Nakata,  Sethi-Ullman,  and  Beatty 
provide  algorithms  which  may  be  used  to  obtain  optimal  local  assignments  under  the 
assumptions  dictated  by  those  authors.  Local  assignment  is  not,  however,  able  to 
cope  with  data  items  which  may  be  active  on  block  entry  or  exit. 

Global  one-one  assignment  partially  solves  the  problem  of  active  data  items  at 
block  boundaries  by  assigning  data  items  to  registers  throughout  an  entire  region. 
With  this  type  of  assignment,  precautions  need  be  taken  only  at  region  ooundaries  to 
assure  that  values  of  active  data  items  are  retained. 

Assigning  a data  item  to  a register  for  an  entire  region  may  lead  to  inefficient 
use  of  the  registers.  With  accurate  program  flow  information,  it  is  possible  to 
determine  the  points  at  which  a data  item  is  active.  When  the  active  points  of  all  data 
items  are  known,  a set  of  data  items  which  do  not  interfere  may  be  determined  and 
the  elements  of  that  set  assigned  to  the  same  register.  The  availability  of  complete 
and  accurate  flow  information  is  critical  to  efficient  use  of  global  many-one  or  many- 
few  assignments.  In  the  absence  of  flow  information,  mariy-one  and  many-few 
assignments  degenerate  to  one-one  assignments. 

Day  formulates  global  one-one,  many-one,  and  many-few  assignment  problems 
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as  integer  programming  problems.  He  makes  the  intuitively  reasonable  assumption 
there  is  some  profit  (>0)  associated  with  the  assignment  of  a data  item  to  a register 
and  that  this  profit  depends  on  the  frequency  and  context  of  the  use  of  the  data  item. 
Day  gives  several  algorithms  for  solving  the  assignment  problems.  Some  of  these 
give  optimal  results  while  others  may  produce  non-optimal  feasible  results  at  a much 
lower  cost  in  computational  complexity.  Jay’s  formulations  of  the  problems  are 
summarized  below. 

The  global  one-one  assignment  is  the  simplest  of  the  three  problems  since  no 

interference  data  is  required.  Refering  to  the  definition  of  a one-one  assignment  let 

n m cardinality(N')  and  m ■ cardinality(C')  and  let  p be  a vector  of  profits  such  that  p ^ 

is  the  profit  associated  with  assigning  nL  i N'  to  a register.  Vector  * Is  a selection 

vector  such  that  *•  - 1 if  nL  i N'  is  assigned  to  a register,  otherwise  xj>  - 0.  Let  I be 

a vector  of  l*s  of  appropriate  size  so  that  lx  produces  the  sum  of  the  elements  of  x; 

then  the  problem  is 

maximize  z - px 

subject  to  lx  S nt 

where  * j < {0,  1}  and  Pj  > 0 

The  solution  to  the  one-one  assignment  is  simple:  assign  the  m data  items  with  the 
largest  profits  to  registers. 

The  global  many-one  assignment  problem  is  similar  to  the  one-one  problem 
except  for  the  added  restriction  that  no  two  data  items  which  are  assigned  to  the 
register  may  interfere.  Day  expresses  this  condition  in  terms  of  a matrix  of  data  item 
interference  values  (C  | cLj  - 1 if  nit  nj  iN\L  + j interfere;  cLj  - 0 otherwise). 

The  many-few  assignment  problem  is  an  extension  of  the  many-one  assignment 
problem  to  more  than  one  register.  The  problem  is  to  select  the  best  combination  of 
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many-one  assignments.  Day  explicitly  excludes  multi-valued  mapping  which  mij,ht 
assign  a single  data  item  to  different  registers  at  different  points  in  a region. 

In  his  conclusion,  Day  reports  the  results  of  several  tests  of  the  actual 
execution  characteristics  of  his  algorithms  for  many-few  assignment.  The  OPTSOL 
algorithm  (which  provides  an  optimal  solution)  requires  much  longer  execution  time  for 
relatively  little  gain  over  the  estimating  algorithms.  (Sample  values:  for  one  register 
and  48  data  items  t(optimal)  ■ 6 sec.,  t(estimate)  = 0.06  sec.).  The  total  profits 
produced  by  the  estimating  algorithms  are  consistently  greater  than  90 7.  of  the  profit 
produced  by  the  OPTSOL  algorithm  and  are  significantly  better  than  a one-one 
solution  to  the  same  problem.  Day  concludes  that  his  algorithms  are  sufficiently  fast 
to  be  included  in  an  optimizing  compiler. 

1.4  Approach  to  the  problem 

In  order  to  build  the  optimizing  compiler  mentioned  earlier,  it  is  necessary  to 
have  a general  overall  model  of  the  resulting  compiler.  Once  we  have  this  model  we 
can  divide  the  task  into  subproblems  along  the  lines  of  the  phases  of  the  resulting 
compiler  and  attack  the  subproblems  individually.  The  overall  structure  of  the 
compiler  presumed  in  this  thesis  is  the  decomposition  of  the  Bliss-11  compiler 
[Wul75].  The  Bliss-11  compiler  is  decomposed  into  five  major  phases: 

1.  LEXSYNFLO  --  lexical,  syntactic  and  global  flow  analysis. 

2.  DELAY  --  Program  tree  manipulation.  Replacement  of  some  nodes  by 
simpler  but  equivalent  nodes.  Determination  of  evaluation  order  and 
target  paths.  General  decisions  about  the  code  to  be  produced. 

3.  TNBIND  --  Allocation  of  registers  and  other  temporary  storage. 

4.  CODE  --  The  actual  code  generation. 

5.  FINAL  --  Peephole  optimization  and  preparation  of  the  code  for  final 
output. 
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In  this  thesis  we  will  consider  the  TNBIND  phase  of  the  compiler.  The  name 
TNBIND  comes  from  Temporary  Name  BINDing.  A temporary  name  (TN)  is  a name 
assigned  to  any  location  to  be  used  as  temporary  storage.  A unique  name  is  assigned 
for  each  logically  distinct  entity,  although  several  names  may  represent  the  same 
physical  location  in  the  final  code  stream.  It  is  assumed  that  the  DELAY  phase  has 
made  the  evaluation  order  and  target  path  decisions  described  above.  TNBIND  must 
make  the  actual  bindings  of  TNs  to  locations  after  considering  the  other  issues.  Of 
particular  importance  are  the  machine  requirements  and  characteristics.  Working  from 
the  tree  representation  of  the  source  program  TNBIND  determines  the  number  and 
context  of  the  uses  of  each  TN  and  decides  how  to  bind  the  TNs  to  the  available 
locations  so  as  to  produce  the  'best"  output  code.  There  are  two  basic  goals  in 
studying  the  TNBIND  phase  of  the  generalized  optimizing  compiler  structure.  (1)  We 
want  to  formalize  some  of  the  actions  in  a phase  of  compilation  that  is  usually  a 
collection  of  unrelated  algorithms  at  best,  and  completely  ad  hoc  at  worst.  (2)  We 
also  want  to  make  the  transition  from  a phase  of  a compiler  for  a particular 
language/machine  combination  to  a general  model  of  the  temporary  storage  problem. 
The  TNBIND  model  will  include  the  assignment  of  TNs  for  user  variables,  and  user 
variables  will  be  considered  as  equal  competitors  with  the  compiler  generated 
temporaries  in  the  allocation  of  machine  resources.  The  model  will  also  place  a great 
deal  of  emphasis  on  accurately  determining  the  interaction  of  the  TNs  with  control 
flow  in  the  program.  The  restrictions  placed  on  the  final  bindings  of  TNs  to  locations 
which  arise  from  the  machine  requirements  and  the  run  time  environment  will  be 
considered  in  a general  way  so  that  changes  in  the  machine,  the  language  or  the 
implementation  can  easily  be  incorporated  into  a new  compiler. 

The  thesis  can  be  seen  as  an  extension  of  the  work  described  in  [Wul75].  We 
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take  the  relationship  of  TNBIND  to  the  other  phases  of  compilation  as  defined  by  the 
Bliss-11  decomposition  to  be  the  correct  relationship.  From  that  point  we  expand  the 
TNBIND  idea  to  a general  model  of  temporary  name  binding  applicable  to  a large  class 
of  languages  and  machines.  We  also  consider  new  algorithms  for  the  solution  of  two 
specific  subproblems  of  the  TNBIND  model.  The  goal  is  to  show  how  to  produce  a 
TNBIND  phase  of  a compiler  when  presented  with  the  language  and  the  characteristics 
of  the  target  machine.  It  is  not  proposed  that  the  result  of  the  research  reported 
here  should  be  a piece  of  a running  compiler  compiler  system.  Rather  we  will  present 
a notation  for  describing  a general  model  of  a solution  of  the  problem  and  indicate 
how  the  specifics  of  a particular  language  or  machine  may  be  incorporated  in  the 
model.  Though  it  may  be  somewhat  of  an  understatement,  the  step  from  the  model 
presented  here  to  the  corresponding  piece  Of  a compiler  compiler  is  "merely  a matter 
of  implementation." 

The  following  description  is  an  overview  of  the  implementation  suggested  by 
the  model.  We  assume  here  that  our  only  choices  for  assigning  TNs  are  registers  and 
main  memory.  Each  expression  which  must  produce  a value  and  each  user  variable  is 
assigned  a unique  TN.  Two  values  are  calculated  for  each  TN:  the  cost1,  of  accessing 
the  TN  if  it  is  assigned  to  a register  and  the  same  cost  assuming  the  TN  is  assigned  to 
a location  in  main  memory.  We  also  collect  for  each  TN  a list  of  all  points  in  the 
program  at  which  the  value  of  the  TN  must  actually  exist  in  the  assigned  location. 
The  optimal  binding  of  tho  TNs  to  the  available  locations  is  the  one  which  produces  a 
minimum  cost  program  (i.e.  the  sum  of  the  costs  of  each  TN  for  the  type  of  location  to 
which  it  is  bound  is  minimized)  and  no  two  TNs  which  must  contain  valid  values  at  any 
one  point  are  bound  to  the  same  location.  The  really  hard  parts  of  the  problem  are 


t in  terms  of  code  size,  memory  references,  etc. 
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determining,  accurately,  the  points  at  which  any  given  TN  must  contain  a valid  value 
and  selecting  a set  of  bindings  to  minimize  cost  without  a complete  enumeration  of  all 
feasible  bindings.  By  changing  the  exact  measures  of  cost,  we  can  change  the 
emphasis  of  the  optimization.  If  the  cost  minimization  procedure  is  effective  we  will 
produce  at  least  very  good,  if  not  optimal,  c.'de  as  defined  by  our  cost  measures. 

1.5  Thesis  organization 

Chapter  2 provides  a detailed  description  of  the  model.  This  Includes  the 
decomposition  of  the  problem  into  several  subproblems  Including  cost  computations, 
lifetime  determinations  and  the  actual  binding.  Chapte'  3 gives  a notation  for 
describing  the  pertinent  facts  about  the  language  and  the  target  machine  as  they 
relate  to  register  allocation.  This  is  the  place  where  implementation  decisions  (e.g. 
subroutine  call-return  conventions)  and  machine  specific  information  are  encoded. 
The  machine  specific  information  needed  here  is  not  a description  of  each  opcode, 
but  rather  more  general  information  such  as  the  relative  cost  of  accessing  registers 
and  memory  and  what  Kinds  of  locations  may  (must)  be  used  to  hold  the  operands  of 

various  operators. 

Chapters  4 and  5 describe  solutions  to  two  specific  subproblems:  the 

determination  of  lifetimes  (also  Known  as  free-busy  or  live-dead  analysis)  and  the 
problem  of  binding  a large  number  of  temporary  locations  to  the  limited  physical 
resources  of  the  target  machine.  Chapter  6 discusses  the  reimplementation  of  the 
TNBIND  module  of  the  Bliss-11  compibr  as  a test  case  of  the  thesis.  Chapter  7 
reviews  the  model  and  considers  the  possible  directions  for  future  research. 


Chapter  2 

A View  of  Global  Register  Allocation 

This  chapter  will  present  a description  of  the  TNBIND  model  of  global  register 
allocation.  Global  register  allocation  as  used  here  means  making  decisions  about 
which  register  (or  other  location)  will  be  used  to  hold  a particular  value  by 
considering  a context  larger  than  a single  expression  or  statement.  While  it  will  not 
always  be  explicit  in  the  following  description,  the  intent  of  the  model  is  to  make  aii 
derisions  in  the  context  of  a single  subroutine.  The  model  could  be  expanded  to 
consider  an  entire  c^.  ipilation  or,  with  suitable  intermediate  storage  of  data,  a set  of 
compilations,  but  the  subroutine  is  the  unit  of  program  frequently  considered  by 
other  optimizations  and  is  large  enough  to  provide  the  opportunity  for  interesting 
global  decisions. 

2. 1 The  global  register  allocation  problem 

Most  register  allocation  done  in  actual  compilers  is  local.  As  defined  in  Chapter 
1,  a local  allocation  is  an  allocation  done  entirely  within  a basic  block.  This  type  of 
allocation  is  much  easier  than  a global  allocation  which  makes  allocations  within  a 
region.  It  is  possible  that  an  entire  program  may  be  a single  basic  block,  in  which 
case  local  and  global  allocations  are  identical,  however  such  programs  are  a small 
minority  and  are  rather  uninteresting.  In  order  to  do  the  global  allocation,  we  must 
have  more  information  about  the  control  and  data  flow  of  the  program.  In  traditional 
compilers  this  information  is  not  available  at  the  time  register  allocation  must  be  done. 

Traditional  register  allocation  methods  such  as  those  described  by  Hopgood 
[Hop69]  operate  on  sequences  of  machine  instructions.  The  instructions  come  from 
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the  code  generator  with  the  necessary  registers  specified  symbolically.  The  register 
allocator  then  associates  the  symbolic  names  with  the  actual  registers  of  the  machine. 
A local  allocation  is  the  logical  choice  in  this  situation  because  the  information 
necessary  to  make  global  decisions  is  difficult  to  obtain  from  the  machine  code.  Thus 
the  model  of  compiling  that  places  code  generation  bafore  register  allocation  has  to 
some  extent  dictated  the  use  of  local  allocation  methods. 

The  TNBIND  model  differs  from  the  traditional  view  by  placing  the  register 
allocation  before  the  actual  code  generation.  At  this  point  the  traditionalists  will  cry 
"How  can  you  assign  registers  before  you  know  what  instructions  are  to  be  used?" 
The  answer  lies  in  the  fact  that  the  TNBIND  model  considers  registers  as  more 
general  than  just  a necessary  part  of  a machine  instruction.  This  is  reflected  in  the 
choice  of  the  term  "temporary  name"  rather  than  "register"  to  refer  to  the  entities  In 
question.  A temporary  name  is,  quite  simply,  the  name  of  a place  that  can  be  used  to 
store  information.  It  is  possible  to  Identify  the  values  (information)  that  must  be 
computed  by  looking  at  the  parse  tree  of  a program.*  If  we  couple  the  knowledge  of 
where  these  values  are  computed  and  where  they  are  used  with  some  basic 
knowledge  about  the  machine,  we  can  assign  actual  locations  to  hold  each  of  the 
values  without  ever  knowing  the  exact  sequence  of  instructions  that  will  be  needed 
to  perform  the  computation.  Indeed  it  may  be  the  case  that  we  cannot  decide  on  the 
exact  instructions  needed  until  we  have  determined  whether  certain  values  are  being 
held  in  registers  or  not,  Thus  there  is  somewhat  of  a "chicken  and  egg"  flavor  to  the 
problem,  On  the  one  hand  we  can  argue  that  we  cannot  assign  registers  until  we 
know  what  instructions  are  to  be  used,  and  on  the  other  hand  we  argue  that  we 


t In  the  case  of  a language  construct  that  is  not  closely  represented  by  the  basic 
hardware  of  the  machine,  it  may  be  necessary  to  add  some  information  to  the  nodes 
of  tho  parse  tree  during  semantic  analysis. 
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cannot  decide  on  the  best  sequence  of  instructions  until  we  Know  which  registers  will 
be  used  and  which  operands  will  be  in  registers. 

The  global  allocation  problem  is  hard.  Sethi  [Set75]  has  shown  that  the  general 
problem  is  hard  even  for  straight  line  programs.  Not  only  is  the  problem  of  finding  a 
good  (i.e.  near  optimal)  allocation  polynomial  complete,  but  merely  datermining 
whether  a given  function  which  associates  registers  with  program  nodes  is  a valid 
allocation  is  also  a polynomial  complete  problem. 

2.2  Local  variables 

The  TNBIND  model  treats  user  declared  local  variables  in  the  same  way  it  treats 
temporary  storage  needed  for  expression  evaluation.  This  is  a logical  extension 
since  the  programmer  uses  the  abstract  computational  facility  provided  by  the 
language  in  the  same  way  the  compiler  uses  the  facilities  provided  by  the  target 
machine.  The  programmer  expresses  his  operations  in  terms  of  the  language 
primitives  just  as  the  compiler  expresses  the  language  operations  in  terms  of  the 
hardware  primitives.  The  programmer  uses  his  local  variables  to  temporarily  hold 
intermediate  results.  By  treating  the  local  variables  like  compiler  generated 
temporary  storage  the  programmer  reaps  the  benefits  of  Keeping  some,  if  not  all,  of 
his  results  in  registers.  The  TNBIND  model  differs  from  other  views  of  register 
allocation  by  declaring,  at  the  outset,  that  user  variables  will  compete  on  equal  basis 
with  compiler  variable1;  for  use  of  the  registers. 
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2.3  Input  to  TNB1ND 


Mos,  register  allocation  algorithms  accept  a skeleton  code  stream  as  input.  The 

skeleton  code  — -**"  * 66  8e",r‘’ed  **  ^ 

register  fields  tilled  in  symbolically.  It  is  the  function  of  the  allocator  to  map  the 

symbolic  names  into  physical  assignments.  Some  authors  have  made  a d.stincbon 

between  ePo^hon  and  assign,  the  former  merely  reserving  one  ot  a number  o. 

registers  and  the  latter  deciding  which  particular  register.  In  the  TNB.ND  mode,  we 

use  the  terms  interchangeably.  A program  may  a**  — * 

physically  aval, able,  bu,  it  wit,  no,  ^ more  registers  than  are  physically 

sveitabte.  Because  many  ot  the  OP performed  in  the  TN9IND  mode,  ere 

machine  independent,  having  to  deal  with  actual  machine  instructions  introduces 
unnecessary  complication,.  TNBiND  operates  on  a program  tree  with  machine 

dependence  being  supplied  pa catty  ,0  its  subphas.s  as  repaired.  Each  nod.  ot 

the  tree  which  produces  a real  value*  has  been  assigned  a uni*..  temporary  name 
,TN,  to  hold  the  value.  A TN  is  a symbolic  representation  ot  the  location  which  w,„ 
be  used  ,0  hold  the  value  a,  run  time.  TNs  are  generated  trom  an  in, ini, . pool, 
although  they  wit,  eventually  have  to  be  mapped  onto  the  tinite  resources  o,  the 
targe,  machine.  Local  variables  and  common  subexpressions  are  also  essign.d  TNs. 
TNs  which  have  particular  restrictions  on  the  type  o,  location  ,0  which  they  may  be 
ultimately  bound  carry  an  indication  o,  this  restriction.  The  restrictions  are  the  result 
o,  ,h.  machine  repuirements  and  the  implementation  decisions  as  discussed  in  Chapter 

, the  case  o,  us hies,  the  user  may  have  restricted  the  options  available 

,o  the  compiler,  e.g.  the  Btfss-U  compiler  allows  the  programmer  ,0  declare  tha,  a 

tTT^TWs,  be  an  actual  bit ^ 

* - " - • ,iow  p‘lh- 
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loca1  variable  must  (or  must  not;  be  Kept  in  a register.  The  restrictions,  may  include 
concepts  such  as  "must  be  a register,"  "must  not  be  a register,"  "must  not  be  register 
0,"  or  "must  be  an  even-odd  register  pair."  These  restriction  decisions  are  made  by 
an  earlier  compilation  phase  which  has  knowledge  of  the  semantics  of  the  language  as 
well  as  the  characteristics  of  the  machine.  Earlier  phases  have  also  decided  the 
evaluation  order  of  subexpressions  and  have  made  the  target  path  decisions.  A 
global  flow  analysis  phase  has  identified  common  subexpressions  and  noted  the 
points  at  which  each  such  expression  must  actually  be  evaluated.  There  may  be  more 
that  one  distinct  common  subexpression  identified  from  each  set  of  formally  Identical 
expressions  [Ges72,  Wul75], 

Figure  2-1  shows  the  structure  of  the  input  to  TNBIND  for  an  assignment 
statement.  Each  operator  node  contains  the  operator,  a TN  (which  may  be  null),  and 
pointers  to  the  operands.  In  the  figure  the  target  path  operands  are  marked  with  the 
symbol  "®"  which  appears  in  the  pointer  field.  The  leaf  nodes  of  the  tree  structure 
are  variables  and  literals.  The  leaf  nodes  which  represent  user  variables  to  be 
treated  by  TNBIND  also  contain  TNs.  In  the  figure,  the  variables  "A"  and  "3"  have 
been  assigned  TNs. 

2.4  Actions  of  TNBIND 

Given  the  input  described  above,  the  function  of  TNBIND  becomes  one  of 
bind'ng  the  TNs  to  actual  storage  lociiions  (reg.sters,  memory,  etc.).  This  proceeds 
in  several  subphases  as  shown  in  Figure  ?-2.  In  the  figure  each  box  represents 
one  of  the  subphases.  Boxes  stacked  vertically  represent  independent  processes 
which  may  be  done  in  parallel.  Each  element  of  a vertical  stack  is  dependent  on  one 
or  more  of  the  elements  of  the  stack  on  its  left.  Thus  the  vertical  dimension 
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represents  potential  parallelism  while  the  horizontal  dimension  represents  required 
sequential  execution.  Each  of  the  subphases  in  Figure  2-2  is  described  below. 


Figure  2-2.  The  sebphases  of  TNBIND. 


2.4.1  Targeting  and  Preferencing 

Targeting  is  the  process  of  trying  to  cause  each  result  to  be  computed  in  the 
place  the  result  will  be  needed,  thereby  eliminating  non-productive  data  moves.  The 
values  of  subexpressions  are  targeted  or  directed  toward  producing  a final  result  in 
the  TN  in  which  the  result  will  eventually  reside.  The  targeting  process  attempts  to 
take  advantage  of  the  opportunities  recognized  in  the  target  path  decision  (Section 
1.2.2). 

The  process  is  carried  out  in  an  execution-order  tree  walk.  Each  node  passes 
its  own  TN  to  its  target  path  subnode  saying,  in  effect,  "This  is  where  I would  like 
you  to  leave  your  result."  The  subnode  considers  the  desirability  of  generating  its 
result  in  the  target  TN.  The  considerations  include  whether  the  subnode  is  a common 
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subexpression  or  a user  variable  anti  whether  the  restrictions  on  the  two  TNs 
(ancestor  and  descendant)  are  compatible.  If  the  target  TN  Is  a reasonable  place  to 
compute  the  result,  the  two  TNs  are  "bound  together,"  i.e.  they  will  henceforth  be 
considered  to  be  the  same  TN.  We  refer  to  this  successful  operation  as  "making  the 
target."  The  ancestor  is  Informed  of  the  decision  made  at  the  subnode.  If  the 
subnode  could  not  make  its  target,  the  ancestor  has  the  option  of  asking  that  some 
preference  be  given  to  assigning  the  two  TNs  to  the  same  physical  location  in  the 
later  packing  phase.  The  operation  of  expressing  this  preference  is  called 
preferencLng.  There  may  be  several  degrees  of  preference  forming  a spectrum  from 
weak  preference  to  strong  preference  to  actual  binding  together 

2.4.2  Data  gathering 

In  parallel  with  the  targeting  and  p-eference  class  operations,  the  tree  nodes 
are  numbered  and  a flow  graph  is  built.  (Flow  information  collected  during  the  flow 
analysis  phase  may  not  be  accurate  since  later  phases  may  have  changed  the  order 
of  expression  evaluation.  Flow  information  collected  during  flow  analysis  may  be 
used  If  the  data  collected  is  sufficiently  robust  to  provide  the  Information  needed  by 
TNBIND.  It  is  not  important  to  know  how  the  information  is  gathered.)  The  flow  graph 
Is  represented  as  a sequence  of  linear  blocks.  A linear  block  Is  the  largest  piece  of 
the  program  having  one  entry,  one  exit  and  no  internal  branches.  Each  linear  block 
also  has  pointers  to  each  of  its  possible  successors. 

A record  is  kept  for  each  TN  Indicating  at  which  nodes  the  TN  Is  referenced. 
The  references  are  separated  into  two  classes:  those  that  replace  the  value  with  an 
unrelated  new  value  and  those  that  use  or  modify  the  value.  Changing  one  field  of  a 
variable  which  contains  several  packed  fields  is  considered  a modification.  For  the 


r 


36 


An  Approach  to  Global  Register  Allocation 


purpose  of  this  record,  each  node  is  assigned  a Linear  order  number  (Ion).  (The  Lon 
values  will  be  more  precisely  defined  in  Chapter  A.)  This  information  Is  used  together 
with  the  linear  block  information  to  determine  at  which  nodes  in  the  program  the  TN 
must  contain  its  proper  value.  The  determination  can  be  made  by  a machine 
independent  procedure  which  will  be  described  in  Chapter  4. 

Another  item  of  information  recorded  for  each  TN  is  the  cost  of  Its  use.  This 
cost  is  a function  of  the  number  of  instructions,  memory  cycles,  etc.  which  are 
required  to  perform  the  specified  operations  on  the  TN.  A separate  cost  is  Kept  for 
each  class  of  storage  to  which  the  TN  may  be  bound  so  that  the  savings  in  program 
size/speed  resulting  from  any  particular  binding  may  be  evaluated.  The  cost  function 
may  include  some  frequency  of  execution  data  depending  on  what  information  is 
available  from  the  source  language.  The  specification  of  the  cost  functions  Is 
essentially  what  determines  the  optimization  criteria.  In  general  a cost  may  have  two 
parts:  a static  cost  Cs  and  a dynamic  cost  Cd.  Cs  represents  the  cost  in  terms  of 
code  space.  Cd  represents  the  cost  of  performing  the  access  and  is  multiplied  by  the 
ni  mber  of  times  the  access  is  performed.  Thus  the  basic  flavor  of  the  optimization 
can  be  changed  by  changing  the  relative  values  of  Cs  and  Cd.  Increasing  Cg  relative 
to  Cd  tends  to  optimize  for  minimum  code  size  while  increasing  Cd  relative  to  Cs 
tends  to  optimize  for  minimum  execution  time  (not  necessarily  implying  an  increase  in 
code  size).  For  any  access  a which  is  performed  na  times  the  cost  Is  Cc  + nQCd. 
total  cost  for  a TN  is  given  by  the  expression 

Y1  (Cs  + naCd) 

a 

In  practice  it  may  be  necessary  to  use  an  approximation  for  the  number  of  executions 
of  each  access.  When  the  data  on  branch  probabilities  is  available  it  Is  considered  in 
calculating  na. 
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2.4.3  Lifetime  determination 

In  using  the  concept  of  TN  lifetimes,  we  recognize  the  fact  that  in  general  it  is 
not  necessary  to  retain  the  value  of  any  TN  throughout  the  execution  of  the  program; 
it  is  only  necessary  that  the  value  be  available  when  required.  The  analysis  of 
lifetimes  has  been  given  many  names.  Day  [Day70]  calls  a data  item  "active"  If  It  may 
be  referenced  before  It  is  redefined.  Allen  has  discussed  "llve-aead  analysis" 
[AII71b].  Other  authors  have  referred  to  variables  as  being  "free"  or  "busy"  [Low69]. 
There  is  a basic  difference  In  the  emphasis  of  the  analysis  done  by  these  authors  and 
the  analysis  required  by  the  TNbi'ID  model.  Past  work  has  been  concerned  almost 
exclusively  with  determining  whether  it  is  feasible  or  desirable  to  keep  a copy  of  a 
variable  in  a fast  register  over  some  part  of  a program.  The  TNBIND  model  Is  geared 
toward  keeping  TNs  permanently  in  registers  while  identifying  the  points  In  the 
program  at  which  the  register  may  also  be  used  to  hold  other  values.  A detailed 
description  of  the  lifetime  ar  d the  method  of  determining  it  is  presented  in  Chapter  4. 

2.4.4  Importance  ranking 

After  the  information  described  above  has  been  collected  the  TNs  are  ordered 
according  to  their  relative  "importance."  This  is  a measure  of  how  Important  It  is  for 
to  program  optimization  to  bind  a given  TN  to  a particular  kind  of  location.  The 
importance  of  a TN  is  a function  of  the  sum  of  Cg  and  Cjj  for  each  access  as  well  as 
the  restrictions  on  the  binding  of  the  TN.  Those  TNs  restricted  to  a single  storage 
class  are  given  the  highest  priority  to  be  bound  to  that  storage  class.  The  ranking 
phase  is  really  nothing  more  than  the  creation  of  a number  of  sorted  lists  which  will 
serve  as  input  to  the  packing  phase.  The  purpose  of  the  ranking  Is  to  select  the 
order  in  which  the  TNs  will  be  considered  by  the  packing  phase.  The  most  Important 
TNs  will  be  considered  first. 


/ 


38 


An  Approach  to  Global  Register  Allocation 


2.4.5  Packing 

Tht.  packing  phase  does  the  actual  assignment  of  locations  to  the  TNs  (or  TNs 
to  locations).  A perfect  assignment  is  one  which  satisfies  all  of  the  restrictions  and 
preferences  and  assigns  each  TN  to  a member  of  the  most  desirable  of  its  allowed 
storage  classes.  In  practice  it  is  not  always  possible  to  make  such  perfect 
assignments.  The  function  of  the  packing  phase  in  this  case  is  to  minimize  the 
increase  in  cost  over  the  perfect  solution.  When  it  is  not  possible  to  assign  all  TNs  to 
their  desired  locations,  we  attempt  to  select  a minimum  cost  solution  from  the  set  of 
all  possible  assignments.  This  set  may  be  very  large.  If  there  are  k classes  of 
storage  to  which  TNs  may  be  assigned  and  there  are  n TNs,  then  there  are  kn 
possible  assignments.  Chapter  5 describes  how  this  solution  space  may  be  searched 
efficiently  for  a minimum  cost  solution. 


2.5  Summary  of  the  model 

The  TNBIND  model  Is  summarized  by  the  following  statements! 

1.  A unique  name  (TN)  is  generated  for  each  entity  for  which  the 
compiler  must  choose  a physical  location. 

2.  Using  knowledge  available  from  the  program  tree  and  knowledge 
about  the  target  machine,  costs  and  patterns  of  use  are  determined. 

3.  TNs  which  should  be  bound  to  the  same  location  are  identified.  A 
record  is  kept  when  it  is  desirable  (from  the  optimization  viewpoint) 
that  two  TNs  share  the  same  location,  but  such  sharing  may  not  be 
consistent  with  the  semantics  of  the  program  or  may  not  be  the  best 
decision  in  a more  global  context. 

4.  Lifetimes  for  each  TN  are  determined  from  the  use  patterns  and 
program  flow  information. 

5.  Finally,  TNs  are  bound  to  physical  locations  such  that  the  cost  due  to 
inability  to  make  perfect  assignments  is  minimized. 


Chapter  3 

Describing  the  Language  and  Machine 


The  TNBIND  model  relies  on  attributes  of  the  target  machine.  It  can  be  adapted 
to  a new  target  machine  by  respecifying  machine  attributes  including  the  types  and 
number  of  temporary  storage  locations  to  be  used  and  the  characteristics  of  the 
various  operations  that  the  target  machine  can  perform.  The  model  treats  programs 
at  a much  higher  level  than  traditional  register  allocation  methods.  Since  the  details 
of  the  instruction  set  of  the  target  machine  are  not  explicitly  encoded  in  the  model, 
the  respecification  needed  to  effect  a change  of  targot  machine  Is  the  relationship  of 
target  machine  capabilities  to  language  coi.'tructs  rather  than  the  format  and 
semantics  of  the  instructions. 

The  various  language  dependencies  and  machine  dependencies  affect  the  first 
phases  of  the  TNBIND  process.  There  is  a specific  routine  for  each  node  type. 
These  routines  perform  the  targeting,  preferencing  and  data  gathering  functions  for 
all  of  the  TNs  in  the  program  unit  being  compiled.  As  mentioned  in  Chapter  2,  we  will 
assume  that  the  unit  of  compilation  is  the  subroutine.  TNBIND  considers  the  nodes  of 
the  program  tree  in  execution  order.  That  is,  the  first  node  considered  is  the  first 
node  for  which  code  will  be  generated.  In  a tree  representation  of  the  program,  the 
first  nodes  to  be  considered  are  the  leaves.  The  descendants  of  a node  represent 
the  computations  which  must  be  performed  before  the  computation  specified  by  the 
node  itself.  TNBIND  accomplishes  the  execution  order  examination  of  the  modes  by 
using  a recursive  tree-walk  algorithm.  At  each  node  the  algorithm  is  invoked  to 
examine  the  subnodes.  When  the  algorithms  is  invoked  on  a target  path  subnode,  a 
target  TN  is  passed.  When  the  bottom  of  the  tree  is  reached,  the  necessary  TN 
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processing  for  the  leaf  node  is  performed,  and  the  result  is  returned  to  the  ancestor 
node.  When  all  of  the  descendants  of  a node  have  been  processed.  The  node  itself 
is  processed  and  then  returns  to  its  ancestor.  In  order  to  clarify  this  structure,  we 
present  below  a few  examples  of  the  node  types  and  the  actions  necessary  to 
process  them. 

3.1  A typical  binary  operator 

3.1.1  The  necessary  functions 

Let  us  consider  the  actions  that  are  necessary  during  the  TNBIND  processing  of 
a typical  binary  operator.  We  will  assume  that  this  operator  is  one  which  is 
represented  directly  by  the  hardware,  i.e.  there  is  a hardware  instruction  which 
implements  the  basic  operation.  For  example,  t: . "ADD"  instruction  directly 
implements  the  "+"  operator.  The  TNBIND  processing  is  identical  in  form  to  the  actual 
code  generation  process.  By  this  we  mean  that  the  code  generator  traverses  the 
tree  in  the  same  manner  in  order  to  generate  code.  The  difference  is  that  the 
TNBIND  phase  does  not  involve  the  actual  instructions,  bu*  rather  the  control  flow  and 
the  number  and  Uinds  of  references  made  to  temporarily  stored  data.  The  processing 
of  a binary  operator  usually  proceeds  as  follows: 

1.  process  first  operand 

2.  process  second  operand 

3.  move  target  path  operand  to  temporary 

4.  operate  on  temporary  with  non-target  operand 

5.  leave  result  in  temporary 

The  TNBIND  processing  mimics  the  actions  of  the  code  which  will  eventually  be 
produced.  (If  the  target  machine  organization  dictates  some  other  sequence  of 
evaluation  for  a binary  operator,  then  the  TNBIND  processing  will  change 
accordingly.) 
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First  the  two  operands  are  processed  by  TNBIND  with  the  current  node’s  TN 
being  passed  as  a target  to  the  target  path  operand.  This  processing  occurs 
recursively  until  the  bottom  of  the  tree  is  reached.  After  the  two  operands  have 
been  processed,  we  look  to  see  whether  the  target  path  operand  "made  Its  target."  If 
not  we  must  allow  for  the  move  of  the  value  of  the  target  path  operand  from  the  TN 
where  it  exists  to  the  TN  of  the  current  node  by  increasing  the  costs  of  both  TNs  by 
the  cost  of  the  move.  In  this  case  we  also  record  the  fact  that  we  prefer  that  the 
two  TNs  (target  path  TN  and  current  node  TN)  be  assigned  to  the  same  location.  If 
they  can  be  assigned  to  the  same  location  by  the  later  packing  phase,  the  data  move 
can  be  eliminated.  We  next  update  the  cost  values  of  the  non-target  TN  and  the 
current  .node’s  TN  to  include  the  cost  of  the  operation.  Lastly  we  decide  whether  to 
bind  the  current  node’s  TN  to  the  target  that  was  passed  in  from  above.  Thus 
targeting  decisions  are  made  by  the  recipient  of  the  target  request,  and  targeting  is 
done  by  the  requestor  when  the  targeting  request  Is  rejected. 

In  order  to  specify  these  actions  we  need  to  be  able  to  talk  conveniently  about 
nodes,  operands,  TNs  and  costs.  We  need  to  define  the  basic  operations  to  be 
performed  and  describe  the  sequence  of  these  operations  that  should  be  performed 
for  each  node  type. 

The  primitives  we  need  for  the  typical  binary  operator  described  above  are 

TNP(node, target) 

node:  a tree  node 
target:  a TN  or  0 

Invoke  the  TNBIND  processing  on  "node"  passing  "target"  as  the 
target.  When  no  target  is  being  passed,  "target"  Is  0.  This  Is  the 
function  invoked  for  each  node  by  the  recursive  tree-walk 
algorithm.  Common  actions  are  performed  by  TNP  and  then  the 
node-specific  function  for  the  node  type  is  invoked. 
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Pa  ss  Tar  get(node, target) 

node:  a tree  node  which  is  an  operand  of  a binary  operator 
target:  a TN 

Evaluates  to  "target"  if  "node"  is  the  target  path  operand. 
Otherwise  the  value  is  0 meaning  no  target.  1 his  conveniently 
expresses  the  notion  that  the  target  is  passed  only  to  the  target 
path  operand  without  assuming  which  operand  of  a binary  operator 
lies  on  the  target  path. 

Notell$e(t,when) 
t:  a TN 

when:  a Ion  value 

Add  "when"  to  the  list  of  nodes  which  ere  uses  of  "t". 

NoteCreation(t,when) 
t:  a TN 

when:  a Ion  value 

Add  "when"  to  the  list  of  nodes  which  are  creations  of  "t". 

Move(tfrom,tto) 

tfrom;  a TN  or  tree  node 
tto:  a TN  or  tree  node 

If  there  is  a TN  associated  with  "tfrom"  then  invoke  NoteUse 
passing  the  TN  and  the  current  Ion  value.  Similarly  for  "tto"  except 
invoke  NoteCreation, 

PrefMove(tfrom,tto) 

tfrom:  a TN  or  tree  node 
tto:  a TN  or  tree  node 

Indicate  that  the  TNs  associated  with  "tfrom"  and  "tto"  should  be 
assigned  to  the  same  location  if  possible  (preferencing),  then 
invoke  Moveftfrom.tto). 
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Operate(op,ol,o2) 

op:  «.  machine  operation 
ol:  a TN  or  tree  node 
o2:  a TN  or  tree  node 

Invoke  NoteUse  and/or  NoteCreation  as  appropriate  for  the  action 
of  executing  "o2  «-  o2  op  ol”.  For  example,  the  call 
"operated, X,Y)"  means  that  the  action  "X «-  X + Y"  is  a use  of  Doth 
X and  Y and  therefore  invokes  NoteUse  on  both  X and  Y.  For 
unary  operators  we  omit  "o2“.  Some  operations  are  specified 
generically,  e.g.  "test"  meaning  "test  for  true  or  false". 

Bind(tl,t2) 

tl:  a TN 
t2:  a TN 

Bind  the  two  TNs  together.  That  is,  force  them  to  refer  to  the 
same  locution.  This  is  the  action  when  a node  "makes  its  target." 

In  addition  to  these  primitives  we  need  a few  notational  conventions  to  simplify 
the  explanation  of  the  processing  routines.  We  will  use  the  notation  "X[field  name]" 
to  represent  the  value  of  the  named  field  of  the  item  named  X.  Thus 
"q[first  operand]"  will  refer  to  :irst  operand  of  a node  named  q.  We  will  also  make 
use  of  the  following  abbreviations. 

Abbr.  meaning 

MyTN  node[temporary  name] 

Oprl  node[first  operand] 

0pr2  node[second  operand] 

OpType  node[operetion] 

We  are  now  in  a position  to  state  the  actions  necessary  to  process  a binary 
operator  node  in  terms  o»  * ie  primitives. 

TNP(Opr  1 ,Pa  ssTarge  t(0pr  1 ,MyTN)) 
TNP(0pr2,PassTarget(0pr2,MyTN)) 
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PrefMove(node[target  path],MyTN) 
Operate(OpType,node[non  target  path],MyTN) 


3.1.2  An  examplo 

Let  us  consider  the  processing  of  the  assignment  statement  of  Figure  2-1.  The 
tree  is  reproduced  in  Figure  3-1  with  names  attached  to  the  nodes  so  that  we  can 
refer  to  them  easily.  TN  processing  begins  with  node  Nj,  th  assignment  operator. 
The  assignment  operator  invokes  TNP  on  each  of  its  operands,  passing  no  targets 
since  neither  operand  is  flagged  as  the  target  path  operand.  Figure  3-2  shows  the 
complete  sequence  of  invocations,  The  calls  to  PassTarget  are  not  shown;  rather  the 
result  of  the  PassTarget  evaluation  is  shown  explicitly  in  subsequent  calls  to  TNP. 

In  Figure  3-2  we  see  the  TN  processing  recurring  until  the  bottom  of  the  tree 
Is  reached.  This  first  happens  when  Ify  processes  node  D.  After  N4  has  processed 
both  of  its  operands,  it  sees  that  its  target  request  to  node  D was  rejected  and 
therefore  invokes  PrefMove  to  indicate  that  the  value  of  D must  be  moved  into  tp  At 
this  time  PrefMove  updates  the  costs  of  tj  and  tp,  invokes  Notellse  and  NoteCreation 
on  tp  and  t^  respectively,  and  adds  each  TN  to  the  other’s  preference  list.  N4  then 
invokes  Operate  to  update  the  costs  of  the  TNs  involved  in  the  operation.  Finally  N4 
invokes  Bind  to  bind  its  own  TN  to  the  target  TN, 

After  processing  its  second  operand,  N3  sees  that  N4  accepted  the  target 
request  meaning  that  the  value  of  N4  will  be  left  in  tg.  Ng  invokes  Operate  and  Bind 
and  returns  to  Ng.  Ng  invokes  Operate  and  returns  to  N^  which  calls  PrefMove  to 
update  the  costs  for  the  data  move  from  t3  into  A. 

The  results  of  this  processing  are: 

1.  tp  tg,  and  tg  will  all  refer  to  the  same  location.  This  means  that  the 
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TNP(B.O) 
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TNP(N4,t2) 
TNPtD.tj) 
TNP(2,0) 
PrefMove(D,ti) 
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Bind(t2,*  A) 
TNP(C,0) 


Operated, t2,C) 
Bind(t3,t2) 
Operated, t3,B) 
PrefMove(t3,A) 


Figure  3-2.  A trace  of  TN  processing  actions. 


expression  on  the  right  hand  side  of  the  assignment  will  be  evaluated 
without  any  unnecessary  data  moves. 

2.  tD  is  preferenced  to  tj.  If  this  use  of  D happens  to  be  the  end  of  a 
lifetime  segment,  then  tD  and  ti  will  be  assigned  to  the  same  locution  thus 
eliminating  the  initial  data  move. 

3.  tA  is  preferenced  to  t3.  If  the  lifetimes  of  tA  and  t3  have  no  points  in 
common,  tA  and  t3  will  be  assigned  to  the  same  location  thus  elimin’tinR 
the  final  data  move. 


3.1.3  Tho  storo  operator 

The  store  operator  is  considered  to  be  a special  properties.  These  properties 
are  due  to  the  special  optimizations  which  may  be  performed  on  store  operations.  In 
general  we  evaluate  the  right  hand  side  leaving  the  result  in  some  TN  and  then  we 
the  value  from  the  TN  to  the  location  named  by  the  left  hand  side.  We  would 


move 
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like  to  eliminate  the  final  data  move  when  the  store  operation  is  inherently  aimple. 
We  say  that  a store  operation  is  simple  when  it  is  possible  to  bind  the  TN  of  the  right 
hand  side  to  the  location  of  the  left  hand  side.'  We  define  a predicate  in  order  to 
Identify  the  simple  cases: 

SimpleStorefleft, right) 
left:  a tree  node 
right:  a tree  node 

Trot  if  it  is  possible  to  bind  the  TN  of  "right"  to  the  location  of 
"left")  foist  otherwise. 

A few  of  the  operations  which  might  be  aimple  on  some  machines  are 

1.  A«-A+k  <k  a constant) 

2.  A+-B+K 

3.  A«-A  op  B(op  one  of  some  set  of  operators) 

4.  A«-A  op  e(e  an  expression) 

3.2  Other  operators 


3.2.1  Unary  operator 

With  the  notation  established,  we  can  easily  describe  the  actions  required  by 
other  types  of  tree  nodes.  The  simplest  of  these  is  the  typical  unary  operator: 

TNP(Oprl,MyTN) 

PrefMove(Oprl,MyTN) 

Operate(OPTYPE,MyTN) 

As  noted  in  the  discussion  of  binary  operators,  the  actions  taken  In  TN  processing 
mirror  the  code  that  will  be  used  to  implement  the  operation. 


t The  exact  definition  of  which  store  operations  are  simple  is  dependent  on  the 
capabilities  of  the  target  machine,  but  the  concept  is  machine  Independent 
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3.2.2  If-then-elsa 

The  processing  for  more  complex  nodes,  such  as  if-then-else.  is  also  easily 
expressed  in  terms  Of  Our  primitives. 

TNP(node[boolean],0) 

Operate(test,node[boolean]) 

TNP(node[then  part],MyTN) 

PrefMove(node[then  part],MyTN) 

TNP(node[else  part], My TN) 

PrefMcve(node[else  part],MyTN) 

Note  that  this  presumes  that  the  if-then-else  has  a value.  If  the  language  does  not 
provide  for  an  if-then-else  expression  then  the  calls  to  PrefMove  and  the  passing  of 
targets  to  the  then  and  else  parts  may  be  eliminated. 

3.2.3  Simple  hops 

The  processing  lor  a while  loop  is  also  very  straightforward.  Remembering 
that  at  this  point  we  need  not  Know  what  instructions  will  be  generated  to  Implement 
the  loop,  we  need  only  process  the  subnodes  of  the  while  in  the  order  In  which  the 
code  will  be  executed.  The  fact  that  the  body  of  the  loop  may  be  executed  several 
times  is  irrelevant  to  the  basic  processing,  but  should  be  considered  in  the  cost 
computations. 

TNPfnodefbooleanJ.O); 

Operate(test,node[boolean]) 

TNP{node[body],0) 

Note  that  in  this  case  we  make  no  provision  for  the  value  of  the  while  loop.  If, 
for  example,  we  wanted  to  specify  that  the  value  of  the  loop  was  the  value  of  the 
last  evaluation  of  the  body,  we  could  pass  MyTN  as  a target  to  the  body  and  insert  a 
PrefMove  at  the  end  of  the  processing  routine. 
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3.2.4  Complex  operations 

Up  to  this  point  we  have  discussed  language  constructs  that  are  common  to 
most  algebraic  languages  and  easily  implemented  on  most  computer  hardware.  One  of 
the  advantages  of  the  TNBIND  model  of  using  temporary  storage  is  the  ability  to 
handle  new  and  more  complex  language  constructs  much  closer  to  the  language  level 
rather  than  at  the  machine  instruction  level. 

Consider  a language  which  Implements  lists  of  items  as  a primitive  data  type. 
There  might  be  a language  construct  which  allowed  a sequence  of  statements  to  be 
performed  on  each  element  of  a list.  A programmer  might  write  something  iike 

forail  I in  L do  begin  Sjj  $2i  S31 . . .1  sn  endj 

The  tree  node  for  such  a construct  wouid  have  three  subnodes:  one  for  the  item 
name  (I),  one  for  the  fist  name  (L)  and  one  for  the  body.  The  processing  in  TNBIND 
might  look  like 

NoteCreation(node[item  name],node[lon]) 
Operate(end-of-list-test,node[item  name]) 

TNP(node[bpdy],0) 

Operate(next-item,node[item  name]) 

where  the  linear  block  information  would  show  that  the  end-of-!ist-test  was  the 
successor  of  the  next-item  operation.  This  example  is  rather  explicit  in  its  use  of  a 
TN  to  hoid  each  element  of  the  list  in  turn.  It  is  easy  to  imagine,  however,  that  the 
same  treatment  could  be  given  to  other  constructs.  For  example,  it  might  be  the  case 
that  the  "forali"  construct  was  only  an  internal  representation  produced  by  the 
semantic  analysis  phase  so  that  the  programmer’s 

L ♦-  sqrt(L) 
is  transformed  into 


forail  I in  L do  I *-  sqrt(I) 
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3.3  Cost  computations 

A very  important  part  of  the  TNE3IND  process  is  the  computation  of  the  costs 
associated  with  each  TN.  One  of  the  simplest  measures  of  cost  to  compute  is  object 
code  size.  It  is  easy  to  calculate  the  size  of  the  object  code  required  for  a given 
access  to  a TN,  This  is  a particularly  interesting  cost  measure  for  a machine  which  an 
instruction  which  accesses  a register  is  smaller  that  an  equivalent  instruction  which 
accesses  a memory  location.  On  other  machines  some  other  measure  may  be  more 
appropriate.  The  point  is  not  what  the  measure  is,  but  rather  that  the  cost 

computation  can  be  separated  from  the  rest  of  the  processing  and  modified 

independently  as  desired. 

As  discussed  in  Chapter  2,  we  want  to  collect  relative  cost  measures;  the 
absolute  measures  are  of  only  marginal  interest.  In  terms  of  a code  size  measure, 
the  cost  da*a  we  want  to  get  information  such  as  "Assigning  variable  X to  a register 
instead  of  a memory  location  will  save  8 words  of  code."  We  can  then  say  that  the 
cost  of  (failing  to  allocate  a register  for)  X is  8 words.  If  code  size  is  our  only 
measure  then  it  makes  no  difference  whether  the  8 is  the  difference  between  2 and 
10  or  between  100  and  108, 

The  costs  are  calculated  in  very  much  the  same  way  that  the  TN  processing  is 
carried  out,  i.e.  by  having  separate  routines  to  handle  the  specific  information  about 
each  node  type  and  a driver  routine  to  handle  the  common  information.  The  cost 

calculations  can  be  included  in  the  TN  processing  routines,  or  they  can  be  separated 

into  a separate  pass  ofer  the  tree. 

The  cost  computations  rely  most  heavily  on  the  attributes  of  the  target 
machine.  The  data  that  must  be  available  for  the  cost  computations  Includes  the 
following  values  for  each  of  the  storage  classes  to  be  considered. 
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The  cost  of  a simple  access  of  a TN. 

The  cost  of  isolating  a subfield  of  a TN. 

The  cost  of  implementing  non  primitive  functions,  e.g.  the  cost  of 
performing  the  "exclusive  or"  operation  on  a machine  which  does 
not  have  a corresponding  hardware  function. 

In  traversing  the  tree,  all  references  to  a TN  are  analyzed  and  the  costs  are 
accumulated.  Those  TNs  which  must  be  assigned  to  a register  are  assigned  arbitrarily 
high  costs  to  assure  that  they  are  treated  first  in  the  packing  process.  The  TN 
processing  phase  attempts  to  keep  the  lifetimes  of  such  TNs  as  small  as  possible, 
relying  on  the  preferencing  operation  to  eliminate  loads  and  stores  when  possible. 

• 

3.4  Mechanically  generating  TNBIND 

It  should  be  possible  to  generate  the  TNBIND  model  of  a compiler  Mechanically 
from  a description  of  the  input  language  and  the  target  machine.  This  is  not  to  say 
that  there  is  an  obvious  algorithm  which  accepts  BNF  and  ISP  descriptions  as  input 
and  produces  program  text  as  output,  but  rather  that  there  is  a systematic  way  of 
using  knowledge  about  the  language  and  the  machine  to  generate  the  necessary 
TNBIND  routines. 

One  possibility  for  specifying  the  language  is  to  provide  functional  descriptions 
of  the  operators  which  look  very  much  like  the  descriptions  in  this  chapter.  That  is, 
specify  for  each  node  type  the  sequence  n*  actions  it  should  take  during  program 
execution.  The  information  is  not  very  specific,  but  rather  a much  more  general 
description  of  the  order  in  which  the  subnodes  are  evaluated  and  how  the  values  of 
the  subnodes  are  to  be  used.  Any  initial  attempt  to  specify,  a new  language  for 
TNBIND  will  most  certainly  involve  writing  out  the  processing  routines  for  each  node 
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type  in  the  tree  representation  of  the  language.  In  TNBIND  we  are  concerned  not 
with  the  syntax  of  the  input  language  but  with  the  semantics. 

Several  pieces  of  information  about  the  target  machine  obviously  must  be 
supplied.  These  include  the  types  of  locations  to  be  used  for  temporary  storage  and 
the  number  of  locations  of  each  type.  TNBIND  must  also  know  the  functional 
characteristics  of  the  various  operations  which  can  be  performed  by  the  target 
machine.  For  example,  if  the  target  machine  is  of  the  "general  register"  variety 
without  no  memory-to-memory  instructions,  TNBIND  must  know  that  there  must  be  a 
TN  associated  with  the  right  hand  side  of  every  assignment,  and  that  that  TN  must  be 
bound  to  a register.  One  candidate  for  specifying  such  information  is  Newcomer’s 
attribute  sets  and  transformations  [New75],  Building  TNBIND  requires  both  the 
knowledge  of  what  the  attributes  and  transformations  are  and  the  costs  of  making  the 
transformations.  The  appropriate  cost  measures  and  values  are  not  readily 
determined  from  classical  descriptions  of  computer  hardware. 

The  advantage  of  the  TNBIND  model  is  that  it  provides  a mold  into  which  we 
can  fit  descriptions  of  languages  and  machines.  In  this  role  the  model  serves  to  point 
out  options  and  keep  the  treatment  Of  Operators  uniform. 
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Chapter  4 

Determination  of  TN  Lifetimes 

One  of  the  assumptions  made  in  discussing  the  problem  of  register  allocation  is 
that  it  Is  in  (act  a problem.  As  computer  hardware  advances  are  made,  the  number  of 
registers  which  can  reasonably  be  made  available  increases.  As  long  as  there  are 
more  registers  available  than  there  are  data  items  to  store  in  them,  there  is  no  real 
problem.  On  the  other  hand,  in  such  cases,  we  begin  to  think  of  new  ways  to  us.  the 
registers  to  increase  the  etticiency  ol  our  programs.  When  machines  moved  away 
from  the  single  accumulator  model,  it  became  possible  to  consider  using  registers  tor 
purposes  other  than  expression  evaluation. 

A number  of  articles  have  appeared  describing  methods  of  program 
optimization  th  ough  judicious  use  of  the  registers.  Lowery  and  Medlock  [Low69] 
describe  the  analysis  don.  in  the  FORTRAN  H compiler  to  keep  the  values  ol 
frequently  used  variables  in  registers  within  loops.  This  type  ol  optimization  is 
frequently  referred  to  as  "load-store  motion"  [AII71a], 

A good  deal  of  work  has  been  done  in  the  area  ot  program  control  How  for  the 
purpose  ol  finding  the  paths  along  which  the  value  of  a variable  may  be  retained  in  a 
register.  Allen  [AII70]  and  Beatty  [Bea71]  discuss  the  use  of  graph  theory  and  the 
concepts  ot  reg  ons  and  intervals  ot  a graph  to  determine  the  aspects  of  control  flow 

relevant  to  register  allocation. 

The  current  state  ol  computer  hardware  provides  us  with  (in  most  cases)  more 
than  enough  registers  to  evaluate  the  most  complex  expressions  occuring  in 
programs.  In  order  to  improve  the  efficiency  of  our  programs  we  would  like  to  keep 
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the  values  of  frequently  used  variables  in  registers  where  they  can  be  accessed  with 
more  speed  and  sometimes  with  shorter  instructions.  A simple  approach  is  to  first 
allocate  the  registers  necessary  for  expression  evaluation  and  then  use  the  remaining 
registers,  if  any,  to  hold  one  variable  each,  inserting  the  necessary  load  and  store 
instructions  into  the  code  at  the  beginning  and  end  of  the  program  segment  being 
considered  (the  most  frequent  piece  of  program  used  with  this  method  is  the  loop  as 
in  [Low69]).  A more  ambitious  goal  is  to  multiplex  several  variables  into  each 
register.  This  idea  has  been  discussed  in  [Day70]  and  [Bea71].  The  approach  taken 
in  the  TNBIND  model  is  of  the  multiplexing  variety,  but  with  a different  emphasis.  In 
the  TNBIND  model  variables  are  either  assigned  to  a register  throughout  their 
lifetimes  or  they  are  not,  This  is  conceptually  much  simpler  than  a model  which  loads 
and  stores  several  variables  during  the  course  of  executing  a program.  The 
disadvantage  is  that  some  variables  used  heavily  in  the  inner  loop  of  a program  may 
not  be  assigned  to  registers  by  TNBIND.  We  might  encourage  the  assignment  of 
registers  for  variables  used  in  loops,  or  we  might  seek  a way  to  incorporate  the  loop 
optimizations  into  the  TNBIND  model.  The  latter  possibility  is  discussed  in  Chapter  7. 

This  chapter  describes  a method  of  determining  the  lifetime  of  temporary 
names.  We  want  a characterization  of  the  lifetimes  which  is  not  only  accurate  but 
also  very  precise  so  that  we  can  make  maximal  use  of  the  registers  by  assigning 
several  TNs  to  each  register.  The  lifetime  of  a TN  is  the  set  of  those  segments  of 
the  program  during  which  the  value  of  the  TN  must  be  available.  The  complexity 
involved  in  determining  the  lifetime  of  a TN  is  related  to  the  type  of  TN,  e.g.  TNs 
which  are  compiler  generated  temporaries  generally  have  much  shorter  and  more 
easily  determined  lifetimes  than  TNs  which  are  user  variables.  A few  definitions  will 
help  in  the  discussion  of  lifetime  determination, 
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4.1  Definitions 

A program  point,  p,  is  used  to  label  an  instant  in  the  execution  of  a program. 
The  term  is  also  applied  to  the  static  program  snd  essentially  names  a particular 
object  code  instruction.  A program  point  represents  a node  in  the  graphical 
represt  ntation  of  the  program.  A successor  of  a node  is  an  element  of  the  set  of 
instructions  which  may  be  executed  immediately  after  the  instruction  represented  by 
the  node.  In  terms  of  the  program  tree,  the  successors  of  a node  are  either  brothers 
or  immediate  ancestors. 

A flow  path  is  a sequence  of  program  points  P2 pn  such  that  for  all  i 

(lSKn)  p|+1  is  an  immediate  successor  of  pj.  A flow  path  describes  a possible 
sequence  of  nodes  in  the  program  flow  graph.  The  length  of  a f'ow  path  is  the 
number  of  transitions  necessary  to  move  from  the  initial  point  to  the  final  point  of  the 
path,  i.e.,  the  length  is  one  less  than  the  number  of  points  in  the  path. 

A use  of  a TN  is  any  reference  to  the  TN  which  requires  the  value  stored  In  it. 
This  includes  simple  loading  of  the  value  or  assigning  a new  value  which  is  a function 
of  the  old  value.  The  latter  Kind  of  reference  may  be  called  a modification. 

A creation  of  a TN  is  a reference  which  stores  a new  value  (not  a function  of 
the  old  value)  into  the  TN, 

A Ion,  or  linear  order  number,  is  a unique  number  assigned  to  each  node  in  the 
program  graph.  The  Ion  values  are  used  to  name  the  program  points.  The  values 
increase  along  any  flow  path  throt-gh  the  program  graph  except  in  the  case  of  loops. 
For  loops  the  successor  of  the  loop  has  a Ion  which  is  greater  than  the  Ion  of  every 
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node  in  the  loop.1.  Within  a linear  block  the  Ion  values  are  consecutive. 

A TN  is  alive  at  a point  pj  iff  there  exists  a flow  path  pf,  . . Pn  such  that  pn  is 
a use  of  the  TN  and  for  every  pj  (i<j<n)  on  the  path,  Pj  is  not  a creation  of  that  TN. 
The  initial  point  of  the  path,  pj(  need  not  reference  the  TN  at  all.  The  essential  idea  Is 
that  the  value  of  the  TN  must  be  preserved  at  every  point  along  the  path. 

The  lifetime  of  a TN  is  the  set  of  points  in  the  program  at  which  the  TN  is  alive. 
The  kind  of  information  we  need  to  determine  lifetimes  is  similar  to  that  used  in  global 
common  subexpression  recognition.  In  order  to  recognize  common  subexpressions 
we  need  to  know  when  the  value  of  a variable  changes  so  that  we  can  find  all  of  the 
expressions  involving  that  variable  and  mark  them  as  changed.  This  is  essentially  an 
analysis  of  program  flow  and  has  been  discussed  by  several  authors  [AII70,  Coc70, 

Ges72]. 

For  the  purpose  of  describing  the  lifetime  determination,  we  will  assume  the 
following  information  is  available:  (1)  a description  of  the  linear  blocks+  of  the 
program  in  terms  of  their  starting  and  ending  Ion  values,  (2)  the  starting  Ion  values  of 
each  successor,  and  (3)  a list  tor  each  TN  indicating,  by  Ion  value,  the  nodes  which 
are  creations  or  uses  of  the  TN, 

A connection  matrix  is  a matrix  of  binary  values  representing  the  successor 
relationships  among  the  nodes  of  a graph.  The  connection  matrix  C of  a graph  of  n 
nodes  is  an  n x n matrix  <Cj j«l  if  node  j is  an  ’mmediate  successor  of  node  I,  Cjj-0 


t The  assumption  here  is  that  we  are  not  burdened  with  the  unrestricted  control 
structures  that  can  be  constructed  by  using  an  arbitrary  gojo.  This  is  not  to  say  that 
the  method  to  be  described  will  not  work  for  such  structures.  Rather  the  simplying 
assumption  merely  makes  the  exposition  of  the  method  less  complex. 

+ A linear  block  has  exactly  one  entry,  one  exit,  and  no  internal  branches.  It  is,  in  the 
most  restrictive  sense,  "straight  line"  code. 
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otherwise).  If  the  nodes  of  a program  graph  are  numbered  by  Ion  as  defined  abo 
then  some  generalizations  about  the  corresponding  connection  matrix  may  be  stated. 
Within  a linear  block  the  only  successor  of  a node  is  the  nod.  with  the  next  highest 
ion  value,  i.e.,  if  node  i is  QOt  the  last  node  of  a linear  block  then  c„-l  Iff  re- 
consider two  linear  blocks  10!  and  LB2  composed  of  nodes  a,  eel,  • • - b and 

d,  d,i e respectively.  By  the  definition  of  linear  block  w.  know  that  the  two 

sequences  have  no  points  in  common  and  that,  except  possibly  at  the  endpoints,  no 
node  of  on.  block  is  a successor  of  any  nod.  In  the  other  block.  In  the  connection 

matrix 


a $ i < b a dSj<e  implies  Cjj  - 0 - Cjj 
Moreover,  I.  w.  think  about  a matrix  representing  the  connections  among  linear 
blocks,  we  know  that  LB2  is  a successor  of  LB!  iff  cbd-l  and  LB,  is  a successor  of 
LB2  iff  Cea-l.  By  knowing  the  composition  of  the  linear  blocks  we  can  y 

convert  back  and  forth  between  the  full  connection  matrix  and  the  matrix  of  linear 

block  connections. 


4.2  An  example 


A sample  program  graph  with  Ion  values,  linear  blocks,  and  the  linear  block 
connection  matrix  is  shown  in  Figure  4-1.  The  example  W-  Has  99  tree  nodes 
broken  down  Into  7 linear  blocks.  Consideration  of  the  connection  matrix  of  linear 
blocks  instead  of  the  connection  matrix  of  nodes  yields  a significant  reduction  in  the 
amount  of  date  required.  Unfortunately  this  abstraction  does  n-t  contain  all  of  the 
information  from  the  original  connection  matrix,  although  the  lost  information  may  be 

recovered. 


Primarily  because  of  the  increased  stor.ge  efficiency,  ws  would  like  .0  be  eble 
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Ion  values 

linear  blocks 

el  = 1-20 

1 - el 

e2  - 21-25 

2 - e2 

e3  - 26-42 

3 - e3;  e4 

e4  - 43-47 

4 ■ e5 

e5  - 48-71 

5 - e6 

e6  - 72-84 

6 ■ e7 

e7  - 85-93 

7 - e8 

e8  - 94-99 

connection  matrix 
12  3 4 5 6 7 


el;  while  e2  do  (e3;  if  e4  then  e5  else  e6;  e7);  e8 
Figure  4-1.  The  example  program. 
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to  perform  matrix  operations  on  the  full  connection  matrix  by  manipulating  the  smaller 
linear  block  connection  matrix.  In  order  to  do  this  v/e  must  transform  the  linear  block 
connection  matrix  into  a more  accurate  representation  of  the  full  connection  matrix. 
In  Figure  4-2  the  connection  mat'ix  has  been  modified  to  include  entries  labeled  "S" 
(for  sequential)  along  the  main  diagonal.  The  connection  matrix  is  now  a shorthand 
for  the  much  larger  node-level  connection  matrix.  In  our  shorthand  matrix,  each 
element  represents  a submatrix  of  the  full  connection  matrix.  The  interpretation  of 
the  values  in  the  shorthand  is 

0 mj(j  - 0 Vi,j 

1 mi,n  “ mj(j  " 0 f or  other  i,j;  n is  the  number  of  rows  in  the  submatrix 

S -»  mj  j+1  ■*  1,  m[i,j]  - 0 for  jHi+1 

Figure  4-3  shows  an  expanded  section  of  the  matrix  from  Figure  4-2.  The 
rows  and  columns  in  Figure  4-3  are  labeled  with  Ion  values.  The  lines  within  the 
matrix  show  the  partitioning. 
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1 2 3 4 5 6 7 
S 1 0 0 0 0 0 
0 S 1 0 0 0 1 
0 0 S 1 1 0 0 
0 0 0 S 0 1 0 
0 0 0 0 S 1 0 
0 1 0 0 0 S 0 

0 0 0 0 0 0 s 


Figure  4-2.  Connection  matrix  with  "S"  entries. 
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Figure  4-3.  Expanded  section  of  connection  matrix.  (Linear 
block  2 and  a portion  of  linear  block  1). 


A second  modification  to  the  matrix  is  based  on  the  information  we  wish  to 
obtain  from  the  program,  namely  what  nodes  in  the  program  are  on  flow  paths  leading 
to  any  use  of  a particular  7N,  In  this  context  any  creation  of  the  TN  effectively 
terminates  the  path.  More  simply,  a creation  node  has  no  predecessors.  This  is 
reflected  in  the  matrix  by  setting  the  columns  corresponding  to  the  creation  nodes  to 
0 (indicating  that  they  cannot  be  successors  of  any  node).  In  the  shorthand  matrix 
this  is  done  by  first  making  the  creation  nodes  separate  linear  blocks.  If  we  consider 
a TN  with  creations  at  nodes  6,  67,  and  80  (in  linear  blocks  1,  4,  and  5)  the  resulting 
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shorthand  matrix  will  appear  as  shown  in  Figure  4-4.  The  columns  labeled  lb,  4b, 
and  5b  are  the  new  blocks,  each  consisting  ot  a single  node  Although  the  matrix  we 
must  now  consider  has  increased  in  size  from  7x7  to  13x13,  it  is  a far  cry  from  the 
potential  99x99  matrix  we  would  have  i<  we  considered  each  node  separately.  It  is 
important  to  keep  the  size  of  the  matrix  to  a minimum  since  from  this  point  on  the 
algorithm  must  be  performed  separately  for  each  TN. 
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Figure  4-4. 

Connection  matrix  with  creations  partitioned. 

The  matrix  we  are  considering  (actually  the  complete  matrix  it  represents) 
shows  only  the  connections  along  paths  of  length  one.  That  is,  Cjj-1  when  there  is  a 
flow  path  of  length  one  from  node  i to  node  j.  In  order  to  be  able  to  answer  the 
question  about  where  a TN  is  alive,  we  must  know  about  all  paths  of  any  length.  Wc 
know  from  the  use  of  connection  matrices  in  graph  theory  that  the  boolean  product 
of  the  connection  matrix  with  itself  will  produce  a matrix  showing  paths  of  length  two. 
Let  us  consider  why  this  is  so,  First  recall  that  the  boolean  product  of  matrices  is 
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just  like  the  algebraic  product  except  that  multiplication  and  addition  are  replaced  by 
the  and  and  or  operations  respectively.  Thus  each  element  of  the  product  matrix  is 
produced  by  oring  together  several  elements  which  have  been  produced  by  ending 
corresponding  elements  of  the  original  matrices. 

The  boolean  product  P of  matrices  A and  B is  defined  by 

A is  of  order  n x k 

B is  of  order  k x m 

P is  of  order  n x m 

k 

P - A * B iff  pj:  » v (a,,  a brj) 

> r-1  ' 

By  this  definition  we  see  that  pjj-1  iff  3x  j ajx-l  and  bxj-l.  For  the  product 
of  a connection  matrix  with  itself  we  have 

Pij  “ " <cir  Acrj> 

> r-l  ' 

- 1 iff  3x  > C|x-1  and  cxj-l 

This  is  exactly  the  statement  that  there  is  some  program  point  px  such  that  there  Is  a 
flow  path  pj,  py,  pj  in  the  graph  of  the  program;  i.e.,  there  is  a path  of  length  two. 

If  we  call  the  original  connection  matrix  indicating  that  it  reflects  paths  of 
length  one  then  C**C^-C^  which  reflects  paths  of  length  two.  Similarly  C^-C^aC^, 
C^-C**C^,  etc.  Forming  the  element-wise  or  of  two  or  more  of  these  matrices 
produces  a new  matrix  whose  elements  indicate  paths  of  any  of  the  constituent 
lengths,  i.e.,  C*  or  indicates  paths  of  length  two  or  less.  The  boolean  sum  C*  or 
or  ■ • • or  C00  is  the  matrix  we  are  seeking,  the  one  which  Indicates  whether  there 
is  a path  of  any  length  between  any  pair  of  nodes.  In  practice  it  Is  only  necessary  to 
accumulate  terms  of  the  sum  until  it  converges  since  in  a finite  graph  there  is  a finite 
maximum  length  path  which  does  not  contain  cycles.  If  the  graph  contains  n nodes 
then  the  sum  will  converge  in  no  more  than  n steps  since  a path  of  length  n+1  or 
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greater  must  contain  a cycle.  The  converged  sum  is  called  the  closure  or  transitive 
closure  of  the  connection  matrix.  Clearly  we  can  find  the  closure  of  a connection 
matrix  by  forming  the  products  and  oring  them  together.  This  solution  is  unattractive, 
however,  since  the  operation  of  finding  the  product  is  itself  an  n^  algorithm.  Steven 
Warshall  [War62]  devised  and  proved  an  algorithm  which  will  determine  the  closure  of 
a connection  matrix  in  time  proportional  to  n^.  The  basic  approach  of  the  algorithm  is 
to  operate  on  rows  of  the  connection  matrix.  Wh?n  processing  the  ith  row,  any  row 
which  has  a 1 in  the  ith  column  is  ored  into  the  ith  row,  Warshall  shows  that  when 
the  rows  and  columns  are  selected  in  the  proper  order  the  closure  is  formed  in  a 
single  pass  over  the  matrix.  The  algorithm  performs  nicely  on  our  shorthand  matrices 
when  we  define  the  or.  operation  on  the  shorthand  elements  as 

0 v X = 0 

1 v X - 1 

S v S = S 

where  X may  take  on  any  of  the  three  values  (0,  1,  S)  and  our  interpretation  of  the 
symbols  in  the  shorthand  for  the  closure  is  changed  to 

0 -»  rrijj  “ 0 Vi, j 

1 *♦  mjj  - 1 Vi.j 

S -*  rrijj  — 1 Vj,i 

Remembering  that  the  elements  of  the  shorthand  matrix  correspond  to  the 
linear  blocks  of  the  original  graph,  it  is  easy  to  understand  the  definition  and  the 
change  of  notation.  In  the  closure  the  element  representing  a linear  block  (the  S 
elements)  will  he  e at  least  all  l’s  above  the  main  diagonal  because  there  is  a path 
from  any  node  in  the  block  to  any  later  node  in  the  block.  The  1 elements  in  the 
connection  matrix  represented  connections  between  blocks,  but  clearly  if  there  is  a 
path  from  one  node  of  one  block  to  a node  of  a second  block,  then  there  must  be  a 


Determination  of  TN  Lifetimes 


63 


path  from  every  node  in  the  first  block  to  every  node  in  the  second.  The  S entries 
serve  only  to  remind  us  of  the  interior  detail  of  the  connections  of  the  nodes. 
Indeed  if  there  is  a path  from  a linear  block  back  to  Itself  ther  the  entry  for  the 
block  will  change  from  S to  1 (S  v 1 - 1)  indicating  that  there  is  a path  from  every 
node  in  the  block  to  every  other  node  In  the  block.  Figure  4-5  6hows  the  closure 
of  the  matrix  from  Figure  4-4. 
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Figure  4-5.  Closure  of  partitioned  matrix. 


Let  us  reflect  for  a moment  on  the  information  contained  in  the  closure  of  the 
connection  matrix.  By  zeroing  the  columns  associated  with  creations,  we  have 
assured  that  no  path  in  the  matrix  passes  through  a creation  node.  The  interpretation 
of  the  elements  of  the  closure  is  that  the  i,j  element  is  1 iff  there  is  a path  from  node 
i to  node  j that  does  not  pass  through  a creation.  If  there  exists  a j such  that  node  j 
is  a use  and  element  i,j  of  the  matrix  is  a 1,  then  it  follows  that  the  TN  is  alive  at  node 
i.  The  lifetime  of  the  TN  is  obtained  as  a bit  vector  by  oring  together  the  columns  of 
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the  closure  matrix  that  correspond  to  the  uses  of  the  TN,  Like  the  other  operations, 
this  too  can  be  performed  in  terms  of  the  shorthand  matrix. 

Suppose  now  that  our  hypothetical  TN  has  uses  at  nodes  32,  74,  and  99  In  the 
matrix  of  Figure  4-5  these  nodes  are  in  linear  blocks  3,  5a,  and  7.  Qring  those 
columns  together  produces 

01111011S111S 

The  S entries  come  from  blocks  5a  and  7 and  indicate  that  the  TN  is  alive  in  those 
blocks  before  (and  including)  the  last  use  in  each  block.  The  1 entries  indicate  that 
the  TN  is  alive  throughout  blocks  lb,  lc,  2,  3,  4b,  4c,  5b,  5c,  and  6.  In  particular  the 
TN  is  not  alive  in  blocks  la  and  4a  and  in  block  5a  after  the  use  at  node  74. 

4.3  Reflection  on  lifetimes 

Let  us  reflect  on  the  importance  of  this  method.  The  result  is  very  important. 
By  repeating  the  process  for  each  TN  we  produce  a precise  specification  of  the 
lifetime  of  each  TN  This  means  that  we  can  determine  easily  and  accurately  whether 
two  TNs  interfere.  Two  TNs  are  said  to  interfere  or  conflict  with  each  other  if  there 
is  any  program  point  at  which  they  are  both  alive,  i.e.  if  the  element-wise  amd  of  the 
lifetime  vectors  for  the  two  TNs  contains  any  non  zero  elements.  This  is  exactly  the 
knowledge  needed  to  make  efficient  use  of  the  registers  in  the  compiled  code.  We 
may  assign  two  TNs  to  the  same  location  only  if  they  do  not  interfere.  Note  also  that 
the  method  is  completely  independent  of  any  language  or  target  machine.  Cnee  we 
are  given  the  linear  blocks  and  their  successors  along  with  the  creation  and  use 
points  the  rest  is  a mechanical  process.  Warshall’s  algorithm  for  finding  the  closure 
allows  us  to  transform  a matrix  into  its  closure  in  a single  pass  over  its  elements. 

At  this  point  let  us  ask  why  it  is  necessary  and/or  desirable  to  expend  the 
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computational  effort  required  to  generate  the  connection  matrices  and  their  closures. 
There  are  much  simpler  algorithms  which  will  produce  good  approximations  of  the 
lifetimes  with  much  less  effort,  First,  we  must  remember  that  our  ultimate  goal  is  to 
produce  compilers  which  generate  the  best  possible  code  (the  metric  being 
determined  by  the  implementor).  The  simpler  algorithms  will  always  lead  us  to  safe 
decisions;  that  is,  we  will  never  be  told  that  two  TNs  do  not  interfere  when  in  fact 
they  do.  The  problem  is  that  the  approximate  lifetimes  may  exclude  the  best  solution 
from  the  set  of  feasible  solutions.  As  long  as  the  extra  effort  is  not  unreasonable, 
our  goal  requires  that  we  use  the  exact  solutions  to  such  problems.  Second,  we 
recognize  that  the  simpler  algorithms  do  yield  exact  solutions  in  many  cases.  To 
exploit  this  fact  we  use  the  simpler  methods  when  possible  by  dividing  the  TNs  into 
two  classes:  interesting  and  uninteresting.  The  uninteresting  TNs  are  those  whose 
lifetimes  may  be  determined  exactly  by  a simple  method.  Their  lifetimes  always 
consist  of  a single  segment  of  the  program  graph  from  the  first  reference  to  the  la*t 
reference.  The  method  described  in  this  chapter  is  used  only  for  the  interesting  TNs, 
those  which  might  have  lifetimes  composed  of  disjoint  program  segments.  In  general 
the  interesting  TNs  are  user  variables  and  common  subexpressions;  the  uninteresting 
TNs  are  compiler  generated  temporaries.  We  make  the  division  into  the  two  classes 
by  declaring  that  any  TN  which  has  more  than  one  creation  or  is  referenced  in  more 
than  one  linear  block  is  interesting.  One  of  the  two  conditions  is  necessary  (but  not 
sufficient)  to  produce  a lifetime  of  disjoint  segments. 
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4.4  Summary  of  Lifetimes 

In  this  chapter  we  have  presented  a method  of  determining  the  lifetime  of  a TN, 
that  is,  the  points  in  the  program  at  which  the  value  contained  in  the  TN  must  be 
preserved.  The  method  depends  only  on  collection  of  data  about  creation  and  use 
points  and  on  knowledge  of  the  flow  of  control.  Because  we  would  like  to  have  a 
very  precise  representation  of  the  lifetimes,  we  want  to  consider  each  program  node 
separately.  However,  even  with  WarshalPs  algorithm,  finding  the  closure  of  the 
connection  matrix  is  an  n2  process  and  n grows  very  quickly.  This  realization  led  to 
the  development  of  a shorthand  matrix  on  which  the  closure  could  be  performed. 
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Chapter  5 

The  Packing  Problem 
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This  chapter  will  describe  an  algorithm  which  assigns  the  TNs  to  the  physical 
locations  available  within  the  target  machine.  The  lifetime  information  generated  by 
the  procedure  described  in  the  previous  chapter  is  taken  as  an  Input  to  the  packing 
algorithm.  The  costs  associated  with  each  TN  and  the  restrictions  placed  on  the 
assignment  of  the  TN  to  physical  locations  are  the  other  inputs.  In  the  ideal  case,  it  is 
possible  to  assign  each  TN  to  a location  which  minimizes  its  cost;  in  practice,  this  is 
frequently  not  possible.  Thus  the  packing  algorithm  must  handle  two  slightly 
different  but  related  problems.  The  first  problem  is  related  to  problems  known  in 
operations  research  as  "cutting-stock"  or  "knapsack"  problems  [Gil66].  The  locations 
to  be  used  for  temporary  storage  represent  the  stock  from  which  pieces  (TNs)  are  to 
be  cut  or  the  knapsacks  into  which  items  (TNs)  are  to  be  packed.  The  measure  of 
space  in  both  cases  is  in  terms  of  program  points.  The  TNs  not  only  require  a given 
number  of  points  but  particular  points.  In  this  respect  the  TN  packing  problem  is 
more  constrained  than  the  cutting-stock  or  knapsack  problems.  If  a TN  is  placed  into 
a location,  it  must  be  at  a fixed  position  and  orientation  in  the  space  of  program 
points  within  that  location.  In  a cutting-stock  problem  trie  task  is  to  cut  pieces  from  a 
piece  of  stock.  The  exact  position  and  orientation  of  the  pieces  is  not  specified.  The 
second  problem  deals  with  the  selection  of  TNs  to  be  assigned  to  the  preferred 
locations  (usually  registers).  The  problem  is  to  minimize  the  increase  in  cost  over  the 
cost  that  would  have  realized  if  all  TNs  could  have  been  packed  into  their  preferred 
locations. 

The  algorithm  described  considers  only  two  classes  of  storage  registers  and 
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memory,  The  algorithm  is  also  based  on  the  implicit  assumption  that  most  TNs  should 
be  assigned  to  registers  if  possible.  These  assumptions  simplify  the  algorithm 
considerably  and  are  a reasonably  accurate  characterization  of  current  computers. 
Extension  to  more  than  two  classes  of  storage  will  be  considered  after  the  algorithm 
is  presented.  The  algoiithm  described  will  produce  an  optimal  solution  to  the  packing 
problem,  i.e.  will  minimize  the  increase  in  cost  due  to  inability  to  assign  all  TNs  to 
registers. 

5.1  The  problem 

The  problem  of  assigning  the  TNs  to  the  set  of  available  registers  is  the  many- 
few  allocation  discussed  by  Day  [Day70],  That  is,  there  are  a number  of  TNs  which 
must  be  assigned  to  relatively  few  locations.  When  the  number  of  TNs  is  not  larger 
than  the  number  of  locations  then  the  solution  to  the  problem  is  trivial.  Added 
complications  are  the  preferences  noted  during  TN  processing  and  the  restrictions  on 
the  locations  to  which  certain  TNs  may  be  assigned.  Let  us  first  consider  the  many- 
one  allocation  problem,  that  is,  reduce  fh©  problem  to  consider  only  one  register. 

Assume  that  the  TNs  are  represented  by  a sequence  T in  which  the  ith  element 
Tj  represents  the  ith  TN  in  some  ordering.  Define  a vector  p of  which  each  element 
Pi  is  the  profit  associated  with  assigning  the  ith  TN  to  a register.  The  profit  is  taken 
to  be  the  difference  between  the  two  cost  measures  calculated  for  the  TN  during 
temporary  name  processing  as  described  in  Chapter  2.  A selection  vector  x is  used 
to  identify  which  TNs  have  been  selected  for  assignment  to  the  register  (*;»1  if  T;  is 
assigned  to  the  register,  *.“0  otherwise). 

Two  TNs  are  said  to  interfere  if  their  lifetimes  have  any  points  in  common. 
Total  interference  exists  among  the  TNs  in  a set  N if  nj  interferes  with  nj  for  all 


n,,n j < N,  Mj.  Let  N*  be  a subset  of  N such  that  there  is  total  interference  among  the 
elements  of  N*.  Let  N**  be  a set  of  subsets  giving  a complete  description  of  the  total 
interference  data;  N**  = {N*,  N^,  . . N* ).  This  implies  that  if  nj  interferes  with  np 
then  there  exists  at  least  one  i such  that  N*  < N**  and  {n j,np}  cNj . Let  A be  an 
interference  matrix  (which  has  dimension  (m  x k)  when  there  are  k TNs)  such  that 
a | « 1 if  n;  < N*;  otherwise  a:j  = 0.  The  selection  vector  x is  a solution  to  the 
assignment  problem.  The  solution  is  feasible  if  no  two  TNs  selected  by  x interfere. 
This  condition  is  expressed  by  Ax<l,  i,e.  each  element  of  the  product  of  A and  x is 
meaning  that  there  is  at  most  one  TN  occupying  the  register  at  any  program  point. 


The  optimal  solution  to  the  many-one  assignment  problem  is  the  solution  of  the 
integer  programming  problem: 

maximize  z - px 

subject  to  Ax<l 

where  x ^ i {0,1} 

aij  f {0,1} 

Pi  > 0 

This  problem  is  described  by  Balinski  [Bal65]  as  a weighted  set  matching  problem. 
There  is  no  krown  solution  to  this  class  of  problems  aside  from  examining  all  possible 
selection  vectors  and  evaluating  the  objective  function  of  each.  Fortunately  it  is  not 
necessary  to  actually  calculate  the  values  for  each  solution  (remember  that  for  n TNs 
there  are  2n  solutions). 
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a complete  solution  from  the  set  of  possible  optimal  feasible  solutions.  Implicit 
enumeration  is  the  process  of  excluding  a set  of  solutions  from  the  set  of  possible 
optimal  feasible  solutions  without  the  explicit  enumeration  of  each  element  of  the  set 
being  excluded.  Let  z'  be  the  value  of  the  objective  function  ( pz ) for  the  most 
profitable  feasible  solution  yet  obtained,  To  fathom  a partial  solution  Sp  is  to 
determine  that  either  there  is  no  feasible  completion  of  Sp  with  profit  greater  than  z 
or  that  there  is  a most  profitable  feasible  completion  of  Sp  with  profit  z >z  . If  a 
partial  solution  Sp  is  fathomed,  then  the  set  of  completions  of  Sp  is  implicitly 
enumerated.  The  key  to  finding  an  optimal  feasible  solution  is  finding  an  effective 
fathoming  procedure. 

Before  discussing  the  solution  further,  it  is  interesting  to  note  that  the 
procedure  used  to  find  an  optimal  solution  to  the  many-one  problem  may  be  used  to 
find  an  optimal  solution  to  the  many -few  problem.  An  obvious  approach  to  the  many- 
few  problem  is  to  treat  it  as  a sequence  of  many-one  problems.  With  this  approach 
an  optimal  assignment  is  found  for  one  of  the  registers.  The  remaining  TNs  are  then 
used  as  input  to  a second  many-one  problem  for  a second  register.  The  process 
continues  until  either  the  supply  of  registers  or  TNs  is  exhausted.  This  approach  has 
two  disadvantages:  it  may  produce  a non  optimal  solution  and,  more  importantly,  it 
does  not  allow  the  preference  data  to  be  consideied. 

A more  general  approach  to  the  many-few  problem  is  to  expand  the  many-one 
problem  by  a factor  of  m,  the  number  of  registers  to  be  considered.  The  structure  of 
this  problem  is  identical  to  that  of  the  many-one  problem  with  only  the  makeup  of  the 
matrices  changing.  The  problem  is  stated  as 
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maximize 

z - pV 

subject  to 

AV<1 

where 

ajj  * {0,1} 
Pi  > 0 

Here  p*  is  a sequence  pj,  P2,  • • Pm  where  each  pj  is  identical  to  the  original  profit 

vector  p of  the  many-one  problem.  Similarly  x*  is  a sequence  xj,  X2 xm  where 

x^  is  the  selection  vector  for  the  ith  register.  A*  is  a matrix  composed  of 
submatrices. 

A* . » I,  an  identity  matrix  of  order  (n  x n) 

when  i*j+l,  A*.  « A,  the  original  interference  matrix;  order  (p  x n) 

otherwise,  a!*,  - 0,  Mj  a zero  matrix  of  order  (p  x n). 

The  form  of  ma'.nx  A*  is  shown  in  rigure  5-1.  The  addition  of  the  identity  elements 
assures  that  no  TN  car  be  assigned  to  more  than  one  register.  In  the  abstraction  of 
the  integer  programming  problem  this  constraint  is  necessary  to  eliminate  solutions 
wrich  might  achieve  a high  profit  by  assigning  a particularly  profitable  TN  to  all  of  the 
registers. 

5.2  The  procedure 

The  complete  enumeration  of  the  solutions  to  the  ’jssignment  problem  is 
obtained  by  a branch  and  bound  procedure  [Mit70],  Such  methods  use  a branching 
procedure  to  generate  an  ordered  sequence  of  partial  solutions  and  a set  of 
bounding  rules  to  provide  a lower  bound  on  the  value  of  the  objective  function  for 
each  possibly  optimal  feasible  completion  of  each  partial  solution.  The  procedure, 
shown  in  Figure  5-2,  begins  with  no  TNs  assignea  to  registers.  One  by  one  TNs  are 
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Figure  5-1.  The  expanded  interference  matrix. 

assigned  to  a register  (maintaining  the  feasibility  of  the  partial  solution)  jntil  the 
resulting  partial  solution  is  fathomable.  At  this  point  the  complete  solution  with  the 
largest  value  for  the  objective  function  is  remembered  and  all  completions  of  the 
current  partial  solution  are  implicitly  enumerated.  The  last  TN  assigned  in  the  partial 
solution  is  then  removed  from  its  register  and  the  procedure  !$  repeated  from  the 
fathoming  step/  The  process  terminates  when  all  solutions  have  been  enumerated, 
i.e.  when  the  initial  (empty)  partial  solution  is  fathomable. 

5.2.1  The  fathoming  procedure 

The  fathoming  procedure  is  fairly  simple.  If  the  current  partial  solution  Is  $p, 
then  let  us  define  to  be  the  completion  of  Sp  which  assigns  to  registers  all  TNs 
which  are  free  in  Sp.  If  the  value  of  the  objective  fui  stion  associated  with  is  not 
greater  than  the  largest  objective  function  value  yei  observed,  then  there  is  no 
completion  of  Sp  which  will  produce  a larger  value  and  Sp  is  fathomed.  Othe-wise  Sp 
is  not  fathomed  and  the  procedure  of  Figure  5-2  will  try  to  assign  the  next  TN  to  a 


+ If  there  is  a possibility  that  assigning  the  removed  TN  to  a different  register  may 
produce  a better  feasible  completion  then  this  may  be  t-ied. 
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register.  Regardless  of  whether  the  attempt  to  assign  the  next  TN  to  a register  is 
successful,  the  result  is  a new  partial  solution  derived  from  the  old  solution  by 
assigning  a vaLe  to  one  of  the  free  elements  of  the  old  selection  vector.  This  new 
partial  solution  is  tnen  presented  to  the  fathoming  procedure. 

5.2.2  Backing  up 

When  a partial  solutioi,  is  fathomed,  the  next  step  is  to  "undo"  the  mos  recent 
assignment  of  a TN  to  a register  reflected  in  Sp,  i.e.  the  last  i is  changed  to  0.  All 
elements  of  Sp  following  the  changed  element  are  free.  An  attempt  is  now  made  to 
fathom  the  nev  partial  solution.  Following  this  procedure,  the  partial  solutions 
fathomed  have  more  and  more  free  elements.  Whpn  a partial  solution  containing  k 
free  elements  is  fathomed,  2^  solutions  are  implicitly  enumerated. 

5.2.3  Assigning  another  TN  to  a register 

The  process  of  attempting  to  assign  a new  TN  to  a register  consists  of  trying 
sr  veral  steps  to  find  a location  to  which  the  TN  can  be  assigned  without  interfering 
with  any  TN  already  assigned.  In  the  following  discussion  of  the  steps,  a register  is 
said  to  be  open  it  there  are  any  TNs  assigned  to  it  and  closed  otherwise. 

1.  If  the  TN  has  a preference  for  assignment  then  try  to  assign  the  TN 
to  one  of  fh  , prefered  location1;.  (This  step  is  expanded  in  Section 
5.2.4.) 

2.  Try  to  find  an  open  register  to  which  the  TN  can  be  assigned. 

3.  If  there  are  any  closed  registers,  open  one  and  assign  the  TN  to  it. 

4.  If  steps  1 through  3 fail  to  assign  TN  to  a location  then  the  attempt  is 
unsuccessful.  The  new  partial  solution  will  have  the  rorrespondir.g 
element  of  the  selection  vector  equal  to  0. 
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5.2.4  Honoring  preferences 

As  discussed  in  Chapter  2,  some  TNs  are  "prefe^enced  to  other  TNs.  The 
intent  of  a preferencing  operation  is  to  indicate  that  there  is  some  additional  benefit 
to  be  gained  by  assigning  the  ' . <■  TNs  to  the  same  location,  be  it  register  or  memory. 
In  terms  of  the  generated  code,  failing  to  honor  a preference  means  that  at  some 
point  a value  will  have  to  be  moved  from  one  TN  to  the  other. 

Suppose  the  TN  being  considered  by  the  assignment  algorithm  is  T and  that  T 
has  preferences  P1(  . . .,  P^  The  algorithm  for  honoring  preferences  iterates 
through  the  P/s.  If  any  P(  has  been  assigned  to  a register  and  T can  be  assigned  to 
that  register,  then  the  assignment  is  made  and  the  preference  is  honored.  It  may 
happen  that  none  of  the  preferences  has  been  assigned  to  a register,  or  that  T is 
unable  to  fit  into  any  of  the  registers  to  which  Pj’s  have  been  assigned.  In  these 
cases  the  prefereice  usually  cannot  be  honored.  However,  it  may  also  happen  that 
some  preference  Pj  has  been  assigned  to  a memory  location.  If  the  profit  associated 
with  T is  not  greater  than  the  cost  of  a move  from  memory  to  a register,  then 
assigning  T to  the  preferred  memory  location  will  result  in  a better  overall  solution. 
If  nr ne  of  the  preferences  have  been  assigned  to  locations  or  T cannot  fit  into  any 
of  the  preferred  locations  then  the  preference  cannot  be  honored. 

Note  that  at  this  point  the  only  TNs  which  have  been  assigned  to  memory 
locations  are  those  which  are  restricted  to  be  in  memory.  No  feas.ble  solutions  which 
might  have  higher  profit  are  lost  when  a preference  for  a memory  location  is  honored 
as  long  as  the  profit  loss  due  to  such  an  assignment  is  not  greater  than  the  cost  of  a 
move  from  memory  to  a register. 


t In  a typical  Bliss-11  program,  the  value. of  n is  small  (2  or  less).  Constructs  can  be 
generated,  however,  which  result  in  large  numbers  of  preferences. 
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5.2.5  An  intuitive  view  of  the  procedure 

The  above  procedure  can  be  viewed  as  a packer  operating  on  a vector  of  TNs. 
The  TNs  in  the  vector  are  arranged  in  order  of  decreasing  importance.  As  we  have 
described  importance  this  means  that  the  TNs  we  would  most  like  to  have  assigned  to 
registers  are  considered  first.  In  this  way  we  are  more  likely  to  be  able  to  assign 
the  important  TNs  to  registers  because  the  registers  are  "less  crowded"  when  the 
important  TNs  are  considered.  The  packer  considers  each  TN  In  the  vector  in  turn, 
assigni  ig  as  many  as  possible  to  registers.  When  the  end  of  the  vector  is  reached, 
an  initial  complete  solution  has  been  generated.  The  packer  has  assigned  as  many 
TNs  as  possible  to  registers,  given  the  order  in  which  they  were  considered.  The 
only  fact  that  keeps  us  from  stating  that  this  is  an  optimal  solution  is  the  possibility 
that  -emoving  some  set  of  TNs  from  the  registers  might  allow  the  assigning  of  some 
other  set  of  TNs  with  a larger  combined  importance. 

After  producing  the  initial  complete  solution,  the  packer  works  backward  in  the 
vector  reconsidering  the  assignments.  At  each  step  the  packer  asks  whether 
removing  the  current  TN  from  its  register  might  lead  to  a more  profitable  complete 
solution.  Sometimes  the  answer  is  that  there  can  be  no  more  profitable  completion 
and  the  packer  continues  backward  in  the  vector.  At  other  times  there  is  a more 
profitable  completion  nd  the  packer  moves  forward  in  the  vector  again  to  investigate 
the  feasibility  of  the  more  profitable  comple^on.  When  there  is  no  more  profitable 
completion,  all  of  the  st  unions  in  all  completions  are  implicitly  enumerated.  The 
algorithm  terminates  when  there  are  no  solutions  (i.e.  completions  of  the  empty  partial 
solution)  which  can  produce  a larger  profit  than  the  most  profitable  solution  already 
obtained. 
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5.3  The  practicality  of  obtaining  an  optimal  solution 

The  algorithm  described  above  is  essentially  the  same  as  the  OPTSOL  algorithm 
described  by  Day  for  sol’  ing  the  many-one  assignment  problem.  Day  proves  that  the 
algorithm  terminates  only  after  enumerating  all  solutions  and  yields  an  optimal 
solution.  Mitten  [Mit7Cj  also  gives  a proof  of  the  general  branch-and-bound 
procedure.  Day  reports  experimental  results  of  using  OPTSOL  to  solve  400  randomly 
generated  assignment  problems.  The  results  show  that  the  time  required  to  solve  a 
problem  is  a function  of  both  the  size  of  the  problem  (number  of  TNs)  and  the  density 
of  nonzero  elements  in  the  interference  matrix  C.  Execution  time  was  directly  related 
to  problem  size  and  inversely  related  to  C matrix  density.  The  disheartening  result  is 
that  the  average  execution  over  all  densities  of  C are  large  and  grow  very  rapidly. 
For  one  register  the  following  average  execution  times  (on  an  IBM  360/65)  were 
observed 


number  TNs  t.me  (sec) 

24  <1 

32  2 

48  6 

64  46 

We  see  that  for  moderate  sized  problems^  the  execution  times  quickly  become 
unreasonable.  Day  indicates  tha*  tor  more  than  one  register  the  times  grow  much 
faster.  Appealing  to  the  argument  of  Section  4.3,  we  cannot  expect  that  the  branch- 
and-bound  algorithm  to  guarantee  an  optimal  solution  in  a reasonable  amount  of  time. 
What  we  can  do  is  measure  the  rate  of  progress  of  the  algorithm  toward  termination 
or  the  fraction  of  the  total  number  of  solutions  enumerated.  Care  in  choosing  the 


I 


t In  real  programs  observed  by  the  Bliss-il  compiler,  the  number  of  TNs  per  routine 
averages  less  than  20  because  routines  tend  to  be  small.  However  routines  having 
50  or  more  TNs  are  fairly  common. 
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order  in  which  TNs  are  added  to  partial  solutions  will  increase  the  probability  that  an 
optimal  or  near  optimal  solution  is  found  early  in  the  search.  The  branch  and 
procedure  is  more  effective  if  a feasible,  near  optimal  solution  is  taken  instead  of  the 
empty  solution  as  the  starting  point.  Day  gives  a procedure  for  finding  such  a 
solution  quickly  and  notes  that  the  profit  of  the  initial  solution  is  almost  always 
greater  than  907.  of  the  profit  of  an  optimal  feasible  solution.  Day  suggests  that  the 
initial  solution  is  a close  enough  approximation  to  an  optimal  solution  to  make  it  usable 
and  can  be  computed  quickly  enough  to  make  it  practical.  It  should  be  noted  here 
that  Day's  results  are  for  randomly  generated  profits  and  interference  matrices.  The 
initial  solution  described  in  Chapter  6 for  real  programs  almost  always  produces  an 
optimal  solution. 

5.4  Formulation  of  the  more  general  problem 

Section  5.1  describes  the  formulation  of  the  general  integer  programming 
problem  for  assigning  TNs  to  one  of  several  registers  by  replicating  the  profit  vector 
and  interference  matrix  for  each  register.  In  this  section  we  show  how  this  same 
reasoning  can  be  used  to  express  a much  more  general  problem.  Thus  far  we  have 
assumed  that  there  are  only  two  classes  of  storage  available  for  TNs,  a limited 
number  of  identical  registers  and  an  unlimited  number  of  memory  locations. 

Suppose  that  the  machine  for  which  we  wish  to  compile  code  has  more  than  two 
classes  of  storage.  For  example  the  Univac  1108  has  47  registers  available  to  user 
programs.  Of  this  number  11  are  index  registers,  12  are  arithmetic  accumulators,  4 
are  both  index  registers  and  accumulators,  17  are  fast  memory  locations,  2 have 
special  functions  in  certain  instructions  and  1 is  modified  by  real  time  clock  interrupts. 
In  such  a machine,  there  are  several  levels  of  trade-offs  when  trying  to  decide  which 
of  the  many  classes  of  storage  should  be  used  for  any  particular  TN. 
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Consider  a machine  with  n storage  locations  divided  into  m Casses.  As  before 

F0r  each  TN  compute  asa.-h ' J - - ^ ''  “ "" 

profit  realized  by  assigning  ,e  TN  .0  a member  0,  Cass  i instead  o,  .0  memory.  This 

,s  identicat  ,0  the  profit  vector  caicot.ted  in  Section  5,  if  the  only  two  Casses  o, 
worst  case  solution.  The  problem  is  now 


maximize 
subject  to 
where 


z = p*x* 


A*x*<l  . 
{0.1} 
an  <{0,1} 

P4>0 


as  before,  bo,  fhe  subvectors  0,  P*  are  no  longer  identic,..  Recaii  

formulation  of  the  many-few  register  problem  o,  Section  5.1.  the  subvectors  of  P 
yrere  reactions  o,  the  singie  profit  vector,  tn  the  more  genera,  probiem  we  choose 

,or  the  1th  subvecto Cor  0,  profits  associated  with  the  i.h  storage  location. 

ln  terms  o,  the  abstract  linear  probiem  the  solution  ,0  this  problem  is  no  more 
difficult  tha,  the  solution  o,  the  register-memory  problem  o,  the  same  size,  tn 

practice,  however,  as  the  number  o,  Casses  o,  storage  increases «* 

complexity  of  the  restrictions  placed  on  TNs  tends  to  increase. 


5.5  Summary  of  packing 


The  packing  problem,  as  described  in  this  chapter,  is  the  problem  o,  binding  a 
large  number  of  TNs  to  a relatively  small  number  of  locations. 

concerned  with  the  solution  o,  the  genera,  paching  problem, concentrated  on 

Ihe  situations  in  which  there  are  only  two  classes  o,  storage,  one  desirable 
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and  one  generally  undesirable.  We  have  shown  that  the  problem  can  be  formulated 
as  an  integer  programming  problem  and  that  the  form  of  the  problem  is  essentially  the 
same  regardless  of  whether  we  consider  one  register,  several  registers,  or  several 
classes  of  storage  each  containing  several!  distinct  locations.  Section  5.2  described  a 
branch-and-bound  procedure  which  yields  an  optimal  solution  to  the  problem.  We 
have  also  noted  that  while  the  number  of  possible  solutions  to  be  examined  by  the 
procedure  is  exponential  in  the  number  of  TNs,  the  algorithm  terminates  much  sooner 
if  we  can  determine  a feasible,  near  optimal  solution  from  which  to  begin  the  search 
of  the  solution  space. 


/ 


Chapter  6 
A Test  Case 


This  chapter  will  describe  the  implementation  of  the  TNBIND  algorithms  in  the 
compiler  for  Bliss-11,  a dialect  of  Bliss  [Wul71]  for  the  PDP-11.  The  essential 
features  of  both  the  language  and  the  machine  are  described  below.  The  standard 
version  of  the  Bliss-11  compiler  uses  an  initial  version  of  TNBIND  and  thus  provides  a 
good  comparison  for  some  of  the  more  advanced  Ideas.  The  fact  that  the  language  is 
Bliss  does  not  have  a significant  effect  on  the  implementation!  the  language 
dependencies  for  any  Algol-like  language  would  be  nearly  identical.  The  target 
machine  has  significant  effect  on  the  TNBIND  algorithms,  but  mainly  influences  the 
cost  measures  and  the  kinds  of  targeting  and  preferencir.g  done. 

The  decision  to  use  Bliss-11  as  a test  has  both  advantages  and  disadvantages. 
On  the  positive  side,  the  existing  compiler  is  highly  modular  and  it  was  easy  to 
"unplug"  the  existing  register  allocation  phase  and  "plug  in"  the  new  TNBIND  with 
relatively  few  changes  to  the  rest  of  the  compiler.  This  would  not  have  been  the 
case  with  a compiler  which  started  with  a more  traditional  register  allocation  method. 
It  is  also  intended  that  the  new  TNBIND  will  become  a part  of  the  standard  compiler 
thus  providing  real  world  benefits.  On  the  negative  side,  in  order  to  produce  a 
working  compiler,  TNBIND  had  to  incorporate  all  of  the  functions  provided  by  the 
original  version.  These  included  assigning  TNs  to  those  nodes  which  needed  them, 
generating  linear  block  information,  and  assigning  labels  for  the  final  code  stream. 
The  fact  that  the  TNBIND  algorithm  has  been  implemented  for  only  one  language- 
machine  pair  raises  the  question  of  whether  the  algorithms  would  perform  as  well  on 
other  languages  or,  more  importantly,  other  machines.  TNBIND  does  not  include  some 
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of  the  more  traditional  register  optimizations  because  these  were  not  critical  to  the 
goals  of  Bliss-11.  Chapter  7 will  discuss  how  these  operations  can  be  easily 
incorporated  into  the  TNBIND  philosophy. 

6.1  About  Bliss 

Bliss  falls  into  the  class  of  Algol-like  languages.  It  has  the  features  normally 
associated  with  such  languages:  block  structure,  recursive  routines,  loops, 

conditionals,  local  variables.  Bliss  was  designed  as  a systems  implementation 
language  and  therefore  presumes  little  Or  no  runtime  support.  The  emphasis  is  on 
flexibility  and  runtime  efficiency.  Bliss  provides  the  p.ogrammer  with  the  ability  to 
perform  arbitrary  address  calculations  at  runtime  and  to  address  parts  of  words  when 
appropriate.  More  complete  descriptions  of  the  language  may  be  found  in  [Wul71], 
[DEC74],  and  [Wul75], 

The  programming  style  of  Bliss  tends  to  be  considerable  different  from  Algol  or 
other  languages.  Because  there  in  little  overhead  in  routine  calls,  programmers  are 
encouraged  to  write  small  routines.  In  addition  the  control  structures  observed  in 
Bliss  programs  tend  to  be  well  structured  because  there  is  no  goto  statement  in  Bliss. 
The  control  structure  of  Bliss  makes  the  linear  block  analysis  quite  easy.  The  small 
size  of  the  routines  usually  means  that  the  numbers  of  linear  blocks  and  TNs  are 
small.  These  two  factors  combine  to  make  the  first  pass  implementation  of  TNBIND 
acceptable  in  terms  of  computing  time  required.  As  routines  get  larger  and  have 
more  linear  blocks  and  TNs,  the  time  required  for  the  TNBIND  algorithms  increases 
dramatically.  On  the  one  hand  this  tells  us  that  we  will  need  better  algorithms  in 
order  to  deal  effectively  with  large  routines.  On  the  other  hand  we  can  argue  that 
smaller  routines  and  simpler  control  structures  are  the  "wave  of  the  future,"  and 
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therefore  it  is  not  unreasonable  to  design  algorithms  which  perform  better  for  small 
routines  than  for  large  routines. 

6.2  About  the  PDP-11 

The  PDP-11,  manufactured  by  Digital  Equipment  Corporation  [DEC71],  falls  into 
the  class  of  mini-computers."  It  has  a 16-bit  word  length  which  is  addressed  in  units 
of  8-bit  bytes;  thus  a 16-bit  word  can  address  216  bytes  or  215  words.  Instructions 
come  in  0-  1-  and  2-operand  formats  and  may  be  1,  2,  or  3 words  In  length.  Each 
operand  uses  six  bits  of  the  first  instruction  word  and  may  require  one  additional 
word  to  hold  a 16-bit  address  or  index  quantity.  The  PDP-11  has  eight  registers!  six 
of  these  are  general  registers,  one  is  used  by  both  hardware  and  software  as  a stack 
pointer,  and  one  is  the  program  counter.  Instructions  whose  operands  are  in 
registers  are  both  faster  and  smaller.  The  fact  that  any  instruction  can  have  any  of 
its  operands  in  registers  or  memory  locations  means  that  almost  all  TNs  are  of  the 
kind  that  can  be  assigned  to  either  a register  or  memory  location.  This  makes  the  TN 
packing  a simple  cost  minimizing  procedure  with  few  restrictions  on  the  locations  to 
which  any  TN  may  be  assigned. 

Bliss  makes  use  of  locations  in  the  stack  for  local  storage  when  there  are  no 
registers  available.  Because  the  PDP-11  hardware  uses  the  same  stack,  e.g.  to  save 
the  processor  state  during  an  interrupt  sequence,  all  stack  locations  must  be  allocated 
before  they  are  used.  A stack  location  is  allocated  when  it  is  at  or  below  the  location 
pointed  to  by  the  stack  pointer  register.  Stack  locations  are  allocated  explicitly  by 
adding  a constant  to  the  stack  pointer  register  or  implicitly  by  pushing  parameters. 
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6.3  Bliss- 1 1 implementation  of  TNBIND 

In  the  Bliss-11  compiler  TNBIND  is  presented  with  a tree  representation  of  a 
routine  to  be  compiled.  The  tree  is  traversed  in  a recursive  depth-first  tree  walk. 
There  is  a separate  routine  for  each  type  of  tree  node.  Several  of  these  routines 
were  shown  in  Chapter  3.  A switching  routine,  TNP,  performs  some  common  functions 
(like  updating  the  Icn  value)  and  calls  the  appropriate  routine  for  each  node, 
language.  A complete  listing  of  the  TNP  routines  for  Bliss-11  appears  in  appendix  A. 
The  interesting  parts  of  TNBIND  are  those  that  are  affected  by  decisions  made  by  the 
language  designer/implementer,  Below  we  discuss  two  areas  in  which  non  obvious 
processing  is  done  in  the  Bliss-11  version  of  TNBIND. 

6.3.1  Subroutine  parameters 

In  most  machines  there  are  several  ways  to  call  subroutines  and  pass 
arguments.  In  Bliss-11  the  programmer  may  specify  whether  arguments  to  a 
subroutine  are  to  be  passed  in  registers  or  on  the  stack.  The  caller  is  responsible 
for  removing  parameters  from  the  stack  after  a call,  while  the  callee  is  responsible 
for  restoring  the  contents  of  any  registers  used  by  its  code  (except  for  the  register 
in  which  the  value  of  the  subroutine  is  returned).  In  processing  a subroutine  call 
node,  TNBIND  performs  the  following  actions  for  each  argument. 

1.  Call  TNP  to  do  TN  processing  for  the  argument  expression. 

2.  Generate  a new  TN  and  assign  it  to  the  location  in  which  the  argument 
will  be  passed  (either  a specific  register  or  a stack  location). 

3.  Preference  the  TN  of  the  argument  to  the  TN  of  the  location. 

TNBIND  thus  simulates  the  machine  code  implementation  of  the  subroutine  call  which 
might  be  expressed  as 
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for  each  argument  do 
begin 

compute  value; 

store  value  in  proper  place 

end; 

call  subroutine 

Lifetimes  are  generated  at  this  time  for  the  TNs  which  identify  the  argument 
locations.  These  lifetimes  extend  from  the  Ion  value  at  which  the  argument  will  be 
stored  to  the  Ion  value  of  the  actual  call.  This  prevents  the  argument  locations  from 
being  used  by  any  computations  necessary  to  produce  succeeding  argument  values. 
These  TNs  are  part  of  the  initial  state  input  to  the  packing  phase. 

After  the  subroutine  call,  the  locations  used  for  any  arguments  passed  on  the 
stack  remain  allocated.  Rather  than  deallocate  these  locations  immediately,  the  caller 
keeps  track  of  the  number  of  such  locations  on  the  stack  at  all  times  and  adjusts  the 
stack  pointer  only  when  two  or  more  flow  paths  join  and  the  stack  depth  is  not  the 
same  on  all  of  the  paths.  This  has  two  effects:  the  number  of  instructions  necessary 
to  adjust  the  stack  pointer  after  subroutine  calls  is  reduced,  and  the  locations  on  the 
stack  may  be  used  for  temporary  storage  by  later  calculations  or  local  variables. 
These  locations  are  called  dynamic  temporaries  because  they  are  dynamically 
allocated  and  deallocated  at  run  time.  Since  the  allocation  of  these  temporaries  is  a 
byproduct  of  a routine  call,  there  are  no  additional  instructions  needed  to  allocate 
these  locations.  Thus  we  have  a group  of  dynamically  allocated  locations  whose 
allocation  and  deallocation  overhead  costs  are  zero. 

Temporary  locations,  i.e.  registers  and  stack  locations,  may  be  either  'open  or 
"closed".  In  general  a location  is  "open"  when  it  is  available  for  use  in  a particular 
routine.  When  refering  to  registers,  "open"  means  that  some  TN  has  been  assigned  to 
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the  register  and  consequently  the  contents  of  the  register  will  have  to  be  saved  and 
restored.  When  refering  to  stack  locations,  "open"  means  that  the  location  is 
allocated,  i.e.,  below  the  current  value  of  the  stack  pointer.  Registers  are  either 
open  or  closed  throughout  a routine,  but  stack  locations  may  be  open  and  closed  at 
several  points  within  the  routine.  Each  location  is  represented  by  a list  of  the  TNs 
assigned  to  the  location.  A location  is  closed  when  its  list  is  ^mpty;  dummy  TNs  are 
used  to  mark  the  locations  as  open,  When  a dynamic  temporfry  is  closed,  a dummy 
TN  with  a lifetime  corresponding  to  the  closed  period  is  added  to  the  appropriate  list, 
thus  making  it  unavailable  for  use  by  other  TNs  during  the  closed  period. 

6.3.2  The  store  operator 

SimpleStore  (Section  3.1.3)  is  very  important  in  Bliss-1 1.  Because  the  PDP-11 
can  perform  general  memory-tc -memory  operations,  a great  many  store  operations 
r.re  simple.  A store  operation  may  be  cimple  even  when  the  left  hand  side  is  an 
expression  or  an  own  or  external  * ariable.  The  important  feature  of  the  SimpleStore1 
predicate  is  that  the  machine  specific  details  of  which  operations  are  simple  is 
encoded  entirely  within  the  predicate.  It  is  not  necessary,  for  example,  to  modify  the 
routine  processing  the  "+"  operator  to  take  account  of  the  fact  that  many  "+" 
operations  are  simple.  The  routine  processing  the  store  operator  evaluates  the 
SimpleStore  predicate  for  each  store  operation  and  performs  the  binding  when 
possible.  When  the  store  is  not  simple,  the  right  hand  side  is  preferenced  to  the  left 
hand  side  just  in  case  the  lifetimes  of  the  TN's  involved  are  such  that  the  binding  is 
possible  after  all. 
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6.3.3  Cost  computations 

The  Bliss-11  version  of  TNBIND  counts  memory  references  as  the  cost  of 
accessing  a TN,  The  counts  include  both  the  reference  to  the  actual  value  and  the 
reference  to  memory  which  is  necessary  to  pick  up  an  address  or  index  value  in  a 
multi-word  instruction.  This  measure  tends  to  minimize  both  execution  time  and  code 
size  by  putting  heavily  used  TNs  in  registers,  which  results  in  faster  and  shorter 
instructions.  The  size  of  the  object  code  is  used  as  a static  indicator  of  the 
effectiveness  of  the  optimizations. 

A TN  assigned  to  a register  requires  no  memory  references  for  a direct  access 
and  one  memory  reference  for  an  indirect  access.  A TN  assigned  to  a memory  (stack) 
location  has  a cost  of  two  for  a direct  access  (one  to  get  the  stack  offset  and  one  to 
access  the  stack  location)  and  three  for  an  indirect  access.  Other  forms  of  access 
are  similarly  assigned  costs.  Indexing,  for  example,  has  a cost  of  two  if  the  TN  is  in  a 
register  because  the  hardware  will  handle  the  addition.  If  indexing  is  required  for  a 
TN  in  memory,  the  hardware  function  will  have  to  be  simulated  by  sortware  which  has 
a cost  of  between  five  and  nine  depending  on  whether  there  is  a free  register  at  the 
moment  the  indexing  is  required. 

During  the  TN  processing  tree  walk,  a minimum  and  maximum  cost  Is  computed 

for  each  TN.  The  cort  of  a TN  is  the  sum  of  the  costs  computed  for  that  TN  by  the 

t Because  TNBIND  cannot  always  assign  all  TNs  to  registers,  the  code  generation 
phase  of  compilation  is  sometimes  presented  with  these  nasty  problems.  In  the  case 
of  indexing  the  code  generator  will  load  the  value  of  the  TN  into  a free  register  if 
there  is  one  and  then  proceed  normally.  If  no  register  is  available,  then  code  must 
be  generated  to  explicitly  simulate  indexing.  This  predicament  is  a result  of  the 
decision  that  a TN  is  assigned  to  a single  location  throughout  its  life.  There  may  be  a 
register  available  at  the  time  the  indexing  is  needed,  but  unless  the  register  is 
available  throughout  the  lifetime  of  the  TN,  it  will  not  be  used  by  TNBIND.  It  is 
comforting  to  note  that  in  real  programs  it  is  seldom,  if  ever,  necessary  to  index  by  a 
memory  location. 
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nodes  that  reference  it.  The  cost  computed  at  a node  is  the  product  of  the  cost  of  a 
single  reference  and  the  number  of  references  required  *0  execute  the  function 
represented  by  the  node. 

6.3.4  Lifetimes 

During  the  TN  processing,  creation  anu  use  points  ».:i  terms  of  Ion  values)  are 
noted  for  each  TN  through  calls  on  NotaCreation  and  NoteUse  as  described  in  Chapter 
4.  NoteCreation  and  NoteUse  also  culled  some  additional  information  to  simplify  the 
lifetime  computations. 

The  first  (smallest  Ion  value)  and  last  access  are  noted  for  each  TN,  and  TNs 
which  have  more  than  one  creation  or  are  accessed  in  more  than  one  iinear  block  are 
flagged  as  possibly  having  lifetimes  composed  of  disjoint  program  segments.  As 
described  in  Chapter  4,  this  partitions  the  TNs  into  interesting  and  uninteresting 
subsets.  The  lifetime  of  an  uninteresting  TN  is  simply  all  of  the  Ion  values  between 
the  first  and  last  accesses.  Most  compiler  generated  temporaries,  notably  those  used 
for  expression  evaluation,  fall  into  the  uninteresting  subset  because,  in  general,  they 
have  one  creation  and  one  or  more  uses  within  a single  linear  block.  The  interesting 
TNs  are  those  whose  lifetimes  may  consist  of  noncontiguous  sequences  of  Ion  values. 
These  TNs  are  subjected  to  the  complete  lifetime  computation  describee!  in  Chapter  4. 

There  are  two  types  of  TNs  which  are  not  treated  by  the  lifetime  phase 
because  their  lifetimes  ure  determined  when  they  are  generated.  These  are  the  TNs 
which  hold  the  arguments  to  a routine  call  and  the  dummy  TNs  used  to  close  the 
dynamic  temporaries. 
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6.3.5  Ranking 

The  ranking  phase  computes  the  difference  between  the  maximum  and  minimum 
cost  measures  for  each  of  the  TNs  which  may  be  assigned  either  to  a register  or  to  a 
memory  location.  This  value  is  the  profit  associated  with  assigning  the  TN  to  a 
register.  The  output  of  the  ranking  phase  is  a list  of  all  of  these  TNs  sorted  by 
r of  it.  Those  TNs  which  must  be  assigned  to  a register  but  which  do  not  require  a 
specific  register  are  added  to  the  list  after  being  assigned  arbitrarily  large  profit 
values. 

6.3.6  Packing 

The  packing  phase  is  responsible  for  the  actual  binding  of  TNs  to  locations. 
The  objective  is  to  maximize  the  sum  of  the  profits  of  the  TNs  which  are  assigned  to 
registers.  A secondary  goal  i,i  cases  where  all  TNs  are  assigned  to  registers  is  to 
minimize  the  number  of  registers,  since  the  contents  of  any  register  used  will  have  to 
be  saved  and  restored. 

The  packing  phase  first  assigns  all  TNs  which  require  specific  registers.  These 
TNs  represent  arguments  passed  to  other  routines  in  specific  registers,  the  values 
returned  by  other  routines,  or  a request  from  the  programmer  to  put  some  variable  in 
a specific  register.  The  packing  phase  then  deals  with  the  sorted  list  of  TNs 
prepared  by  the  ranking  phase  The  algorithm  used  is  a slightly  modified 
implementation  of  the  algorithm  described  in  Chapter  5. 

The  TNs  are  considered  to  be  elements  of  a vector  sorted  by  profitability  with 
the  most  profitable  TN  first.  Corresponding  to  this  vector  is  a binary  vector 
indicating  (with  l’s)  which  TNs  are  assigned  to  registers.  The  binary  number  formed 
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by  the  ones  and  zeros  of  this  vector  can  be  taken  as  a numbering  of  the  solutions  of 
the  problem  of  which  TNs  are  bound  to  registers.  Thus  when  there  are  n TNs  to  be 
considered,  there  are  2n  solutions.  Ignoring  for  a moment  the  TNs  which  are 
restricted  to  specific  registers,  there  are  really  (m+l)n  solutions  where  m is  the 
number  of  registers  available.  Tlearly  not  all  of  these  solutions  are  feasible.  The 
goal  of  the  packing  algorithm  is  to  select  the  most  profitable  feasible  solution 

There  are  several  approaches  to  selecting  a feasible*  solution,  Below  are  listed 
three  possible  approaches.  The  three  are  given  in  order  of  increasing  complexity 
and  computing  time  required.  This  is  also  the  order  of  increasing  probability  of 
producing  an  optimal  solution  since  each  method  considers  a larger  portion  of  the 
solution  space  than  the  preceding  method. 


1.  Beginning  with  the  most  important  TN,  try  *o  bind  each  TN  In  turn  to 
some  register.  When  a TN  is  bound  to  a register  (using  a spatial 
analogy  we  say  it  "fits")  it  remains  there  and  the  corresponding 
element  of  the  binary  vector  is  set  to  one.  At  the  end  of  the  TN 
vector  the  solution  represented  by  the  binary  vector,  i.e.  the  one  we 
have  built  during  the  algorithm,  is  used. 

2.  Same  as  1 except  that  when  a TN  will  not  fit  into  any  register  we  try 
to  move  the  TNs  with  which  it  conflicts  into  other  registers.  This 
operation,  called  reshuffling,  is  aimed  at  correcting  decisions  made 
earlier  in  the  algorithm.  Whenever  we  find  a register  into  which  a TN 
will  fit,  we  bind  it  there  without  regard  for  whether  it  might  also  fit 
into  some  other  register. 

3.  Same  as  1 initially.  When  the  end  of  the  vector  of  TNs  is  reached,  we 
back  up  in  the  vector  and  set  the  corresponding  binary  vector 
element  to  zero.  We  remember  the  profit  value  and  the  bindings  of 
the  most  profitable  solution  yet  obtained.  If  it  might  be  possible  to 
obtain  a more  profitable  solution  by  binding  the  TN  just  removed  to  a 
different  register  or  by  not  binding  it  to  a register  at  all,  then  trying 
this  alternate  solution  by  moving  forward  in  the  sequence  again  as 
described  in  1. 
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The  reshuffling  operation  described  is  method  2 is  interesting.  The  term  is 
derived  from  the  effect  on  the  TN  assignments.  No  TNs  are  removed  from  registers 
or  added  to  registers.  The  TNs  assignments  to  specific  registers  are  merely 
permuted  or  shuffled.  The  reshuffling  in  method  2 is  called  "bottom  level"  because  it 
is  invoked  at  the  bottom  of  the  packing  search.  In  method  3 the  reshuffling  is  implicit 
in  the  search  algorithm,  but  is  performed  at  the  top  level,  i.e.  a TN  assignment  is  only 
chans«d  to  another  register  when  all  when  the  search  is  backing  up. 

Method  3 is  equivalent  to  the  branch  and  bound  procedure  described  in 
Chapter  3.  It  will  always  produce  an  optimal  solution,  but  may  explore  a large 
fraction  of  the  (m+l)n  solutions  in  the  process.  Success  in  using  procedure  3 
depends  on  being  able  to  decide  quickly  whether  a new  path  is  worth  exploring.  It  is 
easy  to  tell  whether  the  best  completion  of  a partial  solution  will  produce  a solution 
with  a larger  profit  than  any  solution  yet  obtained.  The  key  is  to  able  to  determine 
the  profit  of  the  best  feasible  completion  without  generating  the  completion.  For  the 
purposes  of  measurement  and  comparison,  all  three  methods  were  implemented.  The 
results  are  reported  in  Section  6.4. 

When  a particular  TN,  t,  is  to  be  packed  by  the  packing  procedure,  the  available 
locations  are  considered  in  a particular  order.  The  order  of  consideration  is 

1.  Register  preference.  If  t has  been  preferenced  to  some  t'  and  s'  has 
already  been  bound  to  a register,  then  the  register  to  which  t’  is 
bound  is  considered.  If  reshuffling  is  being  done  at  the  bottom  level 
then  it  is  invoked  if  necessary. 

2.  Stack  preference.  If  t is  preferenced  to  some  t'  which  has  been 
bound  to  a stack  location,  then  the  stack  location  is  considered  only  if 
the  profit  associated  with  the  TN  is  no  larger  than  the  cost  of  loading 
or  storing  a register  {see  Section  5.2,4). 
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3.  Open  register.  The  open  registers  are  considered  in  an  arbitrary 
order.  When  applicable,  bottom  level  reshuffling  is  invoked, 

4.  Closed  register.  A closed  register  is  selected  and  opened.  Ideally 
this  step  is  only  invoked  when  the  profit  associated  with  t is  greater 
than  the  cost  of  saving  and  restoring  the  register.  In  practice  it  is 
almost  always  correct  to  open  another  register  because  the  register, 
once  opened,  can  be  used  for  TNs  to  be  packed  later. 

After  the  packing  algorithm  terminates  there  may  be  TNs  which  have  not  been 
bound.  These  TNs  are  bound  to  stack  locations.  Dynamic  temporaries  are  used  when 
possible.  As  a last  resort  the  TN  is  bound  to  a static  temporary,  a stack  location 
which  is  explicitly  allocated/deallocated  at  routine  entry/exit. 

6.3.7  A new  kind  of  restriction 

Because  there  are  usually  enough  memory  locations  there  is  no  incentive  to 

expend  large  amounts  of  computing  time  to  produce  an  optimal  packing  of  the 

+ 

remaining  TNs  into  the  smallest  number  of  memory  locations. 

Because  of  the  nature  of  the  variable  length  instructions  of  the  PDP-11,  it  is 
sometimes  desirable  to  load  the  address  of  a frequently  used  variable  or  a frequently 
called  routine  into  s register  and  then  reference  the  variable  or  routine  indirectly 
through  the  register.  The  semantic  analysis  phase  can  generate  TNs  to  hold  the 
addresses  when  the  number  of  accesses  passes  some  threshold.  A problem  arises, 
however,  if  the  solution  produced  by  the  packing  algorithm  does  not  bind  one  of 
these  TNs  to  a register.  If  a TN  used  to  hold  a fixed  address  is  bound  to  a memory 
location,  the  resulting  code  is  worse  (by  any  metric)  than  if  the  TN  had  not  been 


T Clearly  we  can  produce  programs  which  require  any  arbitrarily  large  number  of 
memory  locations  to  hold  TNs.  The  concern  here  is  to  do  very  well  under  ordinary 
circumstances  and  not  worry  too  much  about  'pathological  programs. 
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generated.  In  order  to  accommodate  this  type  of  TN,  a new  type  of  restriction  was 
devised.  Just  as  some  TNs  are  marked  "must  be  a register"  or  must  be  a specific 
register,"  these  TNs  generated  to  hold  addresses  are  marked  "register  or  forget  it 
meaning  that  if  binding  to  a register  is  not  possible  then  the  TN  should  not  be  bound 
to  a stack  location  but  rather  should  be  bound  to  the  variable  in  question.  This  type 
of  TN  opens  tho  way  for  including  many  of  the  classic  register  optimizations  into  the 
TNBIND  philosophy.  Because  thesp  optimizations  are  not  included  in  the  Bliss-11 
compiler,  they  will  not  be  discussed  here;  Chapter  7 will  describe  these  optimizations 
and  how  they  would  be  included  in  TNBIND.  The  possibility  of  including  these 
optimizations  increases  the  credibility  of  TNBIND  as  a method  for  register  allocation 
for  other  languages  and  particularly  for  other  machines. 

6.4  Measurements  of  TNBIND 

This  section  will  present  a comparison  of  several  variations  on  the  TNBIND 
philosophy.  The  intent  is  to  measure  TNBIND,  in  terms  of  quality  of  output  code  and 
compilation  speed,  as  compared  with  more  traditional  methods  of  register  allocation. 
We  also  want  to  explore  the  space  of  possible  TNBIND  algorithms  not  included  in  the 
Bliss-11  compiler.  It  is  not  reasonable  to  build  several  register  allocators  merely  in 
order  to  make  these  comparisons.  Instead  we  simulate  the  actions  that  would  be 
taken  by  several  different  register  allocators  by  "turning  off"  various  pieces  of  the 
TNBIND  optimizations. 

The  compilers  we  will  be  considering  are: 

1.  PB  - The  "production  version"  of  the  Bliss- 11  compiler.  This  compiler 
is  in  use  daily  for  many  programming  projects.  It  incorporates  an 
initial  version  of  TNBIND  with  simplified  lifetimes  and  has  been 
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carefully  tuned  to  produce  high  quality  code  over  a period  of  several 


years, 


2.  TB  - The  compiler  with  all  TNBIND  features  enabled.  This  includes  the 
matrix  closure  to  produce  exact  lifetime  representations  and  the  full 
branch  and  bound  search  for  an  optimal  solution  to  the  packing 
problem.  (As  a practical  matter,  the  search  was  limited  to  a small 
fraction  of  the  solution  space  by  limiting  the  number  of  solutions  that 
wefe  examined.  Because  of  the  nature  of  the  search,  finding  an 
optimal  solution  for  very  large  routines  could  take  many  hours  of 
computing  time.) 


3.  TBR  - The  same  as  TB  except  that  simplified  reshuffling  is  done  at  the 
bottom  of  the  search  rather  than  the  top.  That  is,  when  a TN  will  not 
fit  into  any  register  or  will  not  fit  into  a preferred  register,  a search 
is  made  to  see  if  moving  any  one  TN  would  allow  the  fit.  This  action 
brings  the  reshuffling  closer  to  problem  it  attempts  to  solve  at  the 
cost  of  not  enumerating  all  possible  solutions. 


4.  TBS  - Instead  of  the  complete  enumeration  in  the  packing  phase,  TBS 
uses  a simple,  one  pass  algorithm  in  which  the  packing  of  each  TN  is 
attempted  only  once  along  with  the  bottom  level  reshuffling  of  TBR. 
The  TNs  are  considered  in  order  of  decreasing  cost. 


5.  DUMB1  - All  of  the  "clever"  parts  of  TNBIND  are  disabled.  In 
particular 

a.  All  lifetimes  are  taken  to  be  continuous  from  first  access  to  last 


access. 


b.  TNs  are  considered  for  packing  in  order  of  first  access.  This 
simulates  a register  allocator  making  a pass  over  the  object  code 
assigning  registers  as  they  are  needed, 

c.  Ail  "register  or  forget  it"  TNs  are  forgotten. 

d.  Preferencing  is  turned  off 

e.  There  is  no  attempt  at  reshuffling. 

f.  No  user  declared  local  variables  are  assigned  to  registers. 


6.  DUMB2  - This  is  DUMB1  with  most  of  the  other  optimizing  features  of 
the  compiler  turned  off.  This  gives  some  idea  of  the  code  that  might 
be  compiled  by  an  unsophisticated  compiler.  The  rationalization  for 
this  is  that  Bliss-11  has  many  optimization  features  that  tend  to  be 
partially  redundant,  i.e.,  when  one  optimization  is  disabled,  another 


A Test  Case 


95 


optimization  may  partially  compensate  to  prevent  an  accurate 
assessment  of  the  effect  of  the  disabled  optimization.  Optimizations 
are  not  generally  additive;  the  whole  is  less  than  the  sum  of  the 
parts.  The  additional  features  disabled  in  DUMB2  are 

a.  Common  subexpression  recognition. 

b.  Peephole  optimization.  The  last  phase  of  compilation  which 
performs  transformations  on  the  object  code  to  produce 
equivalent,  but  more  efficient,  sequences  of  instructions. 

There  are  some  optimizations  in  the  compiler  that  cannot  be  easi1/  disabled  for 
the  purpose  of  these  measurements.  The  most  notable  of  these  is  the  extensive 
special  case  analysis  performed  in  the  code  generation  phase. 


6.4.1  The  programs 

Five  programs  with  a total  of  128  routines  were  selected  for  the 
measurements.  With  one  exception  these  are  "real"  programs  as  opposed  to 
programs  written  explicitly  to  test  the  compiler.  The  programs  were  written  by  five 
different  authors  and  thus  represent  somewhat  different  programming  styles.  The 
authors  have  spent  varying  amounts  of  time  tuning  their  programs  to  thi 
optimizations  performed  by  the  standard  compiler.  The  five  programs  are  described 
briefly  below. 

CODGEN.  The  code  generation  phase  of  a FORTRAN  compiler.  Although  this  is  a 
"real"  program  in  the  sense  described  above,  it  has  been  used  as  a benchmark  of  the 
progress  of  Bliss-11  over  the  last  three  years. 

SPOOK.  A daemon  process  which  periodically  examines  the  state  of  an  operat-ng 
system.  This  program  was  under  development  when  the  measurements  were  taken. 

EVLSTK.  The  stack  evaluation  module  of  an  interactive  interpreter. 

TXQMS.  A simple  queue  management  system.  This  is  one  Of  a series  of 
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programs  designed  to  test  the  original  Bliss-11  compiler, 

KA612.  One  module  of  a powerful  interactive  debugging  system  designed  for 
use  with  Bliss-11  programs.  This  program  contained  the  largest  routine  in  any  of  the 
test  programs  (>500  words  of  code). 

6.4.2  The  numbers 

The  cost  measures  used  for  Biiss-1 1 are  designed  to  minimize  code  size  for 
two  reasons:  (1)  in  a computer  with  a small  addressing  space  such  as  the  PDP-11 
minimal  code  size  is  important,  and  (2)  in  the  PDP-11  minimum  code  size  almost  always 
means  minimum  run  time.  In  the  discussions  that  follow,  the  terms  "less  code"  and 
"better  code"  are  taken  to  be  synonymous. 

Figure  6-1  shows  the  code  sizes  produced  by  PB  and  the  three  TBx  compilers. 
Each  column  shows  the  absolute  number  of  words  of  code  produced  and  the  ratio  of 
this  number  to  the  code  size  produced  by  PB.  PB  is  taken  to  be  a state-of-the-art 
compiler  against  which  the  other  compilers  are  measured.  The  DUMB*  compilers  are 
not  shown  in  this  figure  because  they  are  not  really  comparable  to  the  optimizing 

versions. 

Figure  6-2  shows  the  processing  times  in  seconds  for  the  TNBIND  portions  of 
the  four  optimizing  compilers  as  well  as  the  ratios  to  the  PB  times.  Again,  the  DUMBx 
compilers  are  not  included  because  their  times  are  not  representative  of  the  times 
that  would  be  observed  from  an  unsophisticated  compiler.  This  is  because  the  DUMBx 
compilers  contain  all  of  the  generality  of  the  more  complex  compilers  with  only  the 
benefits  removed. 

In  figure  6-3  we  see  the  ratios  of  the  overall  compile  times  to  the  -'it  compile 
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Figure  6-1.  Code  sizes  (percent  of  PB  size). 


1 

t 


PB 

TB 

TBR 

TBS 

CODGEN 

14.6(100) 

69.7(477) 

35.2(241) 

32.9(225) 

SPOOK 

3.5(100) 

12.4(354) 

8.3(237) 

7.7(220) 

EVLSTK 

6.5(100) 

75.6(1163) 

27.7(426) 

26.8(412) 

TXQMS 

1.5(100) 

9.7(647) 

3.3(220) 

3.1(207) 

KA612 

17.2(100) 

339.5(1974) 

310.6(1806) 

247.7(1440) 

Figure  6-2.  Seconds  of  TNBIND  run  time  (percent  of  PB  time). 
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Figure  6-3.  Ratio  of  total  compile  times  (percent). 


time.  This  is  a measure  of  how  the  more  complex  TNBIND  operations  affect  the  total 
compile  time. 

Figure  6-4  shows  a breakdown  of  the  differences  in  routine  size  by  overall 
size  of  the  routine.  The  numbers  represent  the  difference  produced  by  the  "best"  of 
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the  TBx  compilers  using  PB  as  a base.  By  "best"  we  mean  that  the  size  difference 
reported  is  the  difference  between  PB  and  the  smallest  amount  of  code  produced  by 
and  on  the  TBx  compilers.  All  128  routines  are  included. 
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Figure  6-4.  Differences  in  code  size  by  routine. 
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Figure  6-5.  Code  sizes  and  TNBIND  runtimes  for  DUMB*  compilers. 


Figure  6-5  shows  the  code  sizes  and  compile  times  produced  by  the  DUMBx 
compilers  along  with  the  percentage  of  the  PB  sizes  and  times.  These  numbers  do 
not  represent  actual  compilers  and  therefore  should  not  be  taken  to  be  the  results 
one  might  obtain  from  compilers  producing  code  with  the  same  degree  of 
sophistication.  In  particular  the  run  times  of  the  DUMBx  compilers  are  slightly  larger 
than  the  run  times  for  PB.  The  run  times  for  the  DUMBx  compilers  are  useful  Only  for 


A Test  Case  99 

comparisons  with  the  TBx  times  to  see  the  costs  the  particular  operations  omitted  in 
the  DUMBx  compilers. 


6.4.3  Discussion 

We  see  from  Figure  6-1  that  there  is  relatively  little  difference  in  the  code 
sizes  produced  by  the  PB,  TB,  TBR,  and  TBS  compilers.  This  may  be  attributed  to  the 
fact  that  the  TBx  compilers  actually  add  relatively  little  to  the  TNBIND  processing 
already  performed  by  PB.  The  point  of  the  comparison  is  that  the  TNBIND  in  PB  has 
grown  piecemeal  over  the  last  three  years  while  the  TNBIND  in  the  TBx  compilers  is 
the  product  of  about  a month's  work.  Working  from  the  general  model  it  was 
possible  in  a short  time  to  produce  a TNBIND  that  compares  favorably  with  a finely 
tuned  production  compiler. 

Although  we  might  expect  that  the  code  quality  (size)  would  get  progressively 
better  as  we  moved  from  TBS  through  TBR  to  TB,  we  see  from  Figure  6-1  that  this  is 
not  always  true.  One  reason  for  this  anomaly  is  the  interaction  of  optimizations  in  the 
compilers.  TNBIND  can  make  optimization  decisions  only  on  the  basis  of  the  cost 
measures  supplied.  While  these  decisions  are  made  on  a global  scale  as  far  as  the 
register  allocation  is  concerned  (see  discussion  of  global  vs.  local  allocations  in 
Section  2.1),  the  decisions  are  local  in  the  sense  that  they  are  made  without 
interaction  with  other  phases  of  compilation.  Even  optimal  solutions  to  the  allocation 
problem  based  on  the  cost  measures  defined  within  TNBIND  may  not  produce  the  best 
code  when  considered  in  the  context  of  the  other  compiler  optimizations. 


It  is  interesting  to  note  that  even  in  the  worst  case  the  code  produced  by 
DUMB2  is  less  than  twice  the  size  of  the  code  produced  by  the  other  compilers.  One 
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would  expect  that  a truly  unsophisticated  compiler  would  be  worse  than  the 
optimizing  compilers  by  a factor  of  three  of  lour  at  least.  The  tact  that  the  DUMB 
compilers  do  as  well  as  they  do  in  the  size  comparison  is  a tribute  to  the  optumzation 
and  extensive  special  case  analysis  done  in  the  other  phases  of  the  compiler 

The  data  in  Figure  6-2  is  more  disturbing.  The  TBx  compilers  require  much 
more  time  than  the  PB  compiler  to  perform  the  came  task  and  produce  approximately 
the  same  results.  It  is  expected  that  the  TBx  compilers  would  require  more  time 
since  they  are  doing  more  computation,  but  the  differences  are  large  both  relatively 
and  absolutely.  Consider  the  case  of  KA612  which  uses  about  17  seconds  for  TNBIND 
in  PB  and  more  than  five  minutes  tor  TNBIND  in  TB.  Closer  examination  of  the  timings 
shows  that  the  bulk  of  the  difference  is  accounted  for  by  the  lifetime  computations. 
PB  uses  a simplified  lifetime  characterization  which  is  computed  on-the-fly  (in  linear 
time)  during  its  TNP  phase.  There  is  no  increase  in  the  cost  of  lifetime  determination 
due  to  routine  size.  TB  uses  the  full  matrix  closure  algorithm  o<  Chapter  t tor  all 
interesting  TNs.  This  algorithm  experiences  an  n2  increase  in  running  time  as  the  size 
(number  of  linear  blocks)  of  a routine  increases.  Nearly  hall  of  the  TNBIND  t'me  used 
in  KA612  is  accounted  for  in  the  lifetime  determination  for  a single  routine.  This 
routine,  the  largest  in  the  measured  programs,  has  160  linear  blocks  end  70 
interesting  TNs.  The  resulting  cod.  tor  the  routine  is  575  words  from  PB  and  595 
words  from  TB.  The  effect  ol  the  lifetime  computation  on  run  time  can  be  seen  in  the 
differences  between  the  TBS  values  in  Figure  6-2  and  the  values  lor  DUMB1  from 
Figure  6-5.  It  must  be  noted  at  this  point  that  the  DUMBx  compilers  are  not 
■lightening  fast"  for  two  basic  reasons.  (1)  The  TNBIND  phase  accounts  for  only 
about  107.  of  the  total  compile  time  and  therefore  even  reducing  tne  TNBIND  time  to 
zero  would  not  produce  drastic  changes  overall.  (2)  The  DUMBx  compilers  have  to 
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pay  the  price  of  many  of  the  more  sophisticated  algorithms,  e.g.  lifetimes  and  packing, 
without  reaping  the  benefits.  The  DUMBx  compilers  were  generated  mainly  to  show 
code  size  and  not  compile  times. 

The  other  major  difference  in  runtime  between  PB  and  TB  is  the  complete 
enumeration  packing  algorithm,  The  packing  done  by  PB  is  essentially  the  same  as 
that  done  by  TBS.  The  only  basic  difference  is  that  PB  uses  a more  elaborate 
reshuffling  scheme.  PB  will  reshuffle  several  TNs  and  may  recur  several  times  in 
trying  to  make  room  for  a new  TN.  TBS  will  try  to  reassign  only  one  TN  and  will  not 
recur  if  the  reassignment  is  not  possible.  The  difference  between  the  TB  and  TBR 
columns  in  Figure  6-2  is  the  cost  of  the  complete  enumeration.^  In  only  a few  cases 
did  TBR  examine  more  than  the  initial  complete  solution.  The  difference  between  the 
TBR  and  TBS  columns  is  the  overhead  of  maintaining  the  mechanism  for  the 
enumeration  algorithm.  TBS  does  no  backtracking  or  state  saving;  its  solution  is  the 
initial  solution  from  which  TB  and  TBR  begin  to  search. 

In  Figure  6-3  we  see  that  the  compile  time  for  the  TBx  compilers  increased 
over  the  PB  time  by  just  over  a factor  of  three  in  the  worst  case  and  less  than  a 
factor  of  two  in  most  cases.  This  performance  is  quite  respectable,  considering  that 
this  was  the  first  implementation  of  a general  model  of  register  a'locatior.. 

Figure  6-4  shows  how  the  TBx  compilers  compared  with  PB.  An  example  of 
how  to  read  the  table  is  probably  the  best  explanation  of  what  the  data  represents. 
The  routines  are  separated  into  groups  based  on  the  size  of  the  code  produced  by 


+ As  noted  above,  the  enumeration  was  actually  limited  to  a fraction  of  the  solution  to 
avoid  the  exponential  computing  time  that  would  be  required  by  large  problems.  In 
compiling  programs  to  collect  the  test  data,  the  search  was  limited  to  examining  50n 
solutions  where  n is  the  number  of  TNs  to  be  packed.  This  limit  was  reached  by  17  of 
the  128  routines. 
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the  PB  compiler.  Lets  consider  the  routines  which  were  compiled  into  11  to  25  words 
of  code  by  PB.  This  is  the  second  group  of  numbers  in  the  table.  The  second  column 
tells  us  that  there  35  routines  in  this  size  range.  In  the  third  column  we  see  that  the 
TBx  compilers  produced  code  which  was  five  words  smaller  than  the  PB  code  one 
time,  one  word  smaller  four  times,  the  same  size  28  times,  three  and  four  words 
larger  one  time  each.  The  intent  of  this  table  is  to  report  all  of  the  size  differences 
without  making  separate  entries  for  each  of  the  i28  routines.  The  routines  are 
separated  into  groups  by  size  to  give  some  idea  of  the  percentage  differences. 

An  interesting  statistic  in  gauging  the  effectiveness  of  the  various  parts  of  the 

TNBIND  Drocedure  is  the  relative  score  of  the  four  optimizing  compilers  on  the  128 

test  routines.  Each  compiler  is  given  a point  when  it  is  the  simplest  compiler  to 

produce  the  smallest  amount  of  code  for  a routine.  No  points  are  given  if  all 

compilers  produce  the  same  code.  Simplicity  is  measured  by  how  many  of  the 

sophisticated  TNBIND  algorithms  are  included;  thus  PB  is  the  simplest  followed  by 

TBS,  TBR  and  TB.  If  for  some  routine  the  code  sizes  produced  b>  the  compilers  are 

PB  20  words 
TB  18 
TBR  18 
TBS  19 

then  TBR  would  get  one  point.  Computing  this  measure  over  the  128  test  routine 
gives  the  following  ;esult: 

PB  19 
TB  6 
TBR  5 
TBS  12 

This  result  is  interesting  because  it  points  out  the  import^ce  of  precise  lifetimes  as 
opposed  to  the  sophisticated  packing  algorithm.  There  were  23  routines  for  which 
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better  code  was  produced  by  one  of  the  TBx  compilers.  Twelve  times  out  of  that  23 
the  best  code  was  produced  by  the  addition  of  only  the  more  precise  lifetime 
characterization.  The  only  essential  difference  between  PB  and  TBS  is  the  lifetim-' 
characterization.  In  the  test  cases,  TBS  was  actually  actually  a restricted  version  of 
TB.  It  is  reasonable  to  conclude  that  adding  the  more  precise  lifetimes  to  the  PB 
compiler  would  result  in  a compiler  better  overall  than  any  of  the  four  tested.  In 
reaching  this  conclusion  we  assort  that  the  cases  in  which  PB  produced  better  code 
than  any  of  the  TBx  compilers  are  due  to  a combination  of  tuning  and  chance.  The 
tuning  argument  states  that  is  is  reasonable  that  a finely  tuned  simpL  algorithm  will 
sometimes  produce  better  results  than  a more  sophisticated  general  algorithm.  The 
question  of  chance  brings  us  back  to  the  issue  of  compensating  optimizations.  Tl.e 
peephole  optimization  phase  of  Bliss-11  can  have  drastic  effects  on  the  amount  of 
code  produced  by  combining  unrelated  sequences  of  instructions  which  happen  to  be 
identical.  When  a TN  will  fit  into  more  than  one  register,  we  arbitrarily  choose  the 
first  one  we  find.  It  might  be  that  some  other  choice  would  allow  or  disallow 
combining  sequences  of  instructions.  Examination  of  the  code  produced  by  the 
various  compilers  showed  that  the  largest  differences  (both  positive  and  negative) 
were  mainly  due  to  the  action  of  the  peephole  optimization  rather  than  the  TNBIND 
actions.  Since  the  packing  algorithms  of  PB  and  the  TBx  compilers  are  different,  it  is 
not  necessary  that  these  arbitrary  choices  would  be  the  same  in  any  two  compilers. 

6.4.4  Evaluation 

One  of  the  goals  of  this  thesis  is  to  explore  extensions  to  register  allocation 
algorithms.  In  pursuing  that  goal  we  have  looked  specifically  at  the  TNBIND  model 
and  the  lifetime  and  packing  phases  of  that  model.  The  test  results  have  shown  that 
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it  is  possible  to  produce  according  to  the  general  TNE1ND  model  a compiler  which 
compares  favorably  in  code  quality  with  a state-of-the-art  optimizing  compiler. 

The  TBx  compilers  cannot  match  the  PB  compiler  in  execution  speed,  but  the 
increase  is  not  (with  one  exception)  an  unreasonable  tradeoff  for  the  efficiency  in 
production  of  the  compiler.  A slowdown  by  a factor  of  2-10  is  tolerable  for  a first 
try  at  using  the  general  model.  Although  there  are  no  statistics  generally  available, 
one  would  expect  that  similar  degradations  in  performance  were  experienced  by  the 
first  automatically  produced  parsers. 

Another  point  to  consider  in  evaluating  the  results  is  that  while  the  lifetime 
computation  is  an  n^  algorithm  in  the  size  of  the  routine  being  compiler,  the  trend  in 
programming  is  toward  smaller  routines.  Indeed  a compiler  which  produces  very 
good  code  but  require'  long  compilation  for  large  complex  routines  can  be  a factor  in 
encouraging  programmers  to  write  smaller,  more  easily  understood  routines.  In 
addition,  studies  such  as  that  by  Knuth  [Knu71]  have  shown  that  programs  In  general 
are  very  simple  implying  that  the  complex  cases  which  require  large  amounts  of 
computing  should  be  relatively  rare. 


r. 


\ 


[ 


: 


■ 

■ 


! 

:■ 


* 

1 

I 


j 


Chapter  7 
Conclusion 


In  this  chapter  we  consider  again  the  overall  view  ot  what  is  usually  called 
register  allocation.  The  view  is  on  a higher  level  than  that  on  which  register 
allocation  is  usually  discussed,  and  does  not  explicitly  include  some  of  the 
optimizations  frequently  performed  by  optimizing  compilers.  We  will  show  that  the 
TNBIND  philosophy  can  easily  be  extended  to  incorporate  other  optimizations  which 
can  be  expressed  at  the  source  level  of  the  program. 

This  chapter  also  summarizes  the  contributions  of  the  thesis  and  considers 
directions  for  future  research  in  this  area. 

7.1  The  TNBIND  Model -Review 

The  TNBIND  model  of  temporary  storage  allocation  is  more  closely  related  to 
the  source  level  of  the  program  than  to  the  object  level.  As  a result,  the  model  *s 
easily  adaptable  to  new  languages  or  machines.  The  basic  points  of  the  model  are 

1.  Information  local  to  some  part  of  a program  is  computed,  stored,  and 
used  over  contexts  larger  than  a single  statement  or  expression. 

2.  Within  some  piece  of  program  which  can  reasonably  be  considered  as 
a unit,  a compiler  has  the  responsibility  to  allocate  physical  resources 
to  hold  the  computed  information. 

3.  There  is  no  inherent  difference  between  the  intermediate  results 
produced  by  a compiler  in  the  process  of  evaluating  arithmetic 
expressions  and  the  intermediate  results  produced  by  a programmer 
in  the  process  of  performing  some  complex  algorithm. 

Chapter  2 expanded  these  ideas  to  a model  of  implementation.  That  model 
described  the  assignment  of  unique  names  (TNs)  to  the  items  of  information  to  be 
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stored,  the  collection  of  data  to  determine  lifetimes  and  preferences,  and  finally  the 
association  of  each  TN  with  some  physical  location  in  the  target  machine. 

The  advantage  of  the  model  is  that  it  deals  with  representations  of  the  source 
program  which  still  contain  much  information  about  the  semantics  and  control  flow  of 
the  program.  This  high  level  analysis  makes  global  allocations  easier.  Recall  from 
chapter  1 that  a local  allocation  occurs  entirely  within  linear  blocks  while  global 
allocations  consider  larger  contexts.  With  a local  allocation,  interblock  transfer  of 
information  must  be  via  memory  locations  which  can  be  fixed  throughout  the  program 
segment  being  considered.  Since  registers  are  a scarce  resource  (relative  to  main 
memory),  traditional  compilation  techniques  have  reserved  the  use  of  registers  to 
those  data  items  which  are  used  only  in  local  contexts. 

Traditional  register  allocation  methods  deal  with  actual  machine  instructions  for 
which  the  allocator  must  assign  registers.  Without  knowledge  of  what  registers  will 
be  available,  the  code  generator  must  compile  code  for  the  worst  case.  In  the 
TNBIND  model  the  register  allocator,  presented  with  the  constraints,  chooses  a set  of 
bindings  of  TNs  to  locations  which  will  minimize  the  cost  (maximize  the  profit)  of  the 
resulting  object  program,  The  code  generator  is  then  able  to  produce  code  which  Is 
specifically  tailored  to  the  TN  bindings. 

The  TNBIND  model  was  based  on  the  compiler  structure  of  Bliss-11.  A question 
naturally  arises  as  to  what  effect  the  adoption  of  this  compiler  structure  has  on  the 
TNBIND  concept.  The  lifetime,  ranking,  and  packing  subphases  of  TNBIND  are 
independent  of  any  change  in  the  compiler  structure.  Those  subphases  require  only 
that  flow  information  be  available  and  that  certain  decisions,  e.g.  evaluation  order, 
have  been  made.  If  we  can  determine  the  connection  matrix  and  the  creation/use 
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points  of  the  TNs,  we  can  perform  the  lifetime,  ranking,  and  packing  operations  at  any 
point  in  the  compilation.  In  that  6ense  a good  deal  of  TNBIND  is  independent  of  the 
compiler  structure. 

On  the  other  hand,  if  we  had  not  started  with  the  Bliss-11  decomposition  we 
might  not  have  developed  the  model  as  we  did.  One  can  argue,  we  believe,  that  the 
Bliss-11  structure  is  the  "right"  structure  for  a highly  optimizing  compiler. 

7.2  More  traditional  optimizations 

Ther*  is  no  reason  why  any  optimization  of  register  utilization  which  can  be 
expressed  at  the  source  level  of  a program  cannot  be  incorporated  into  the  TNBIND 
model.  One  such  optimization  which  has  been  discussed  frequently  Is  the  use  of  a 
register  to  hold  the  value  of  a frequently  accessed  variable  during  the  execution  of  a 
loop.  This  is  done  by  inserting  instructions  to  load  the  value  into  a register  before 
the  loop  and  to  store  the  value  after  the  loop.  If  it  can  be  determined  that  the  va>'ie 
of  the  variable  cannot  be  changed  within  the  loop  then  the  store  instruction  can  be 
eliminated.  This  is  the  optimization  described  by  Lowery  and  Medlock  [Low69].  To 
incorporate  this  optimization  into  an  implementation  of  TNBIND  we  simply  assign  a TN 
to  hold  the  value  of  the  variable  during  the  loop  and  add  the  necessary  load  and 

i 

store  as  assignment  nodes  in  the  parse  tree.  The  TN  is  is  marked  "register  or  forget 
it"  so  that  failure  to  assign  a register  to  the  TN  will  result  in  the  TN  being  bound  to 
the  location  of  the  variable.  Presumably  the  code  generator  in  a compiler  which 
includes  these  optimizations  is  clever  enough  to  ignore  requests  to  assign  the  value 
of  a variable  to  itself. 

The  addition  of  such  optimizations  at  the  level  of  TNBIND  rather  rhan  at  the 
source  level  means  that  the  TNs  generated  for  the  optimization  are  considered  as 
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equal  competitors  for  the  registers  available  to  the  allocator.  A/e  can  then  assert 
with  reasonable  confidence  that  choosing  registers  for  some  such  TNs  and  not  for 
others  is  the  correct  decision,’ 

7.3  Contributions  of  the  thesis 

This  thesis  has  presented  a high  level  model  of  a global  register  allocation 
process.  The  model  in  fact  deals  with  the  more  general  problem  of  assigning 
locations  to  hold  computed  values  during  the  execution  of  a compiled  program.  The 
modei  is  not  restricted  to  the  situation  which  allows  only  two  classes  of  locations  to 
be  used  for  such  temporary  storage.  The  model  defines  a cost  which  is  associated 
with  each  entity  to  be  stored  and  each  class  of  storage  location.  The  concept  of 
targeting  values  to  particular  locations  is  not  new,  but  its  application  in  terms  of  the 
model  is  different.  Preferencing  is  a new  concept  and  Is  very  useful  in  providing 
direction  to  what  would  otherwise  be  an  arbitrary  decision. 

The  fact  that  the  registers  are  bound  before  code  generation  means  that 
meaningful  special  case  analysis  can  be  profitably  applied  during  code  generation. 
The  TNBIND  model  also  provides  a starting  point  from  which  to  generate  this  phase  of 
a compiler  automatically.  The  basic  knowledge  needed  by  TNBIND  is  a list  of  what 
operands  of  what  operations  must  or  must  not  be  in  a particular  class  of  storage. 
The  cost  measures  can  be  as  simple  or  as  complex  as  the  compiler  designer  wishes. 

It  may  be  observed  that  some  of  the  ideas  reported  here  are  also  presented  in 
[Wul75].  In  the  thesis  we  have  taken  the  basic  compiler  structure  of  Bliss-11  as 
given,  but  we  have  not  taken  the  details  of  the  implementation,  We  have  developed 


t Correct  in  the  sense  that  the  decisions  minimize  the  cost  of  the  object  program  as 
we  have  defined  the  cost  measures. 
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the  general  model  of  register  allocation  and  explored  some  new  dimensions  of  the 
solution  space.  The  emphasis  of  the  thesis  has  been  on  providing  a framework  within 
which  specific  solutions  to  particular  problems  can  be  explored.  The  intent  is  that  the 
register  allocator  of  a compiler  should  be  produced  (either  manually,  mechanically,  or 
more  likely  a combination  of  the  two)  according  to  the  TNBIND  model  and  then  tuned 
to  the  pecularities  of  the  language  and  machine.  It  is  expected  that  this  tuning  will 
usually  take  the  form  of  adjusting  tho  cost  measures  and  introducing  preferencing. 
Preferencing  is  a particularly  powerful  technique,  because  it  allows  us  to  indicate  the 
code  we  want  to  generate  in  the  normal  case  without  precluding  what  we  must  do  in 
the  worst  case. 


7.4  Future  research 


7.4.1  Better  algorithms 


As  shown  by  the  timing  data  in  chapter  6,  the  lifetime  determination  can  become 
very  costly  for  large  routines.  On  the  other  hand,  adding  the  more  precise  lifetime 
characterization  to  the  Bliss-11  compiler  produced  better  code  in  about  107.  of  the 
routines  examined  in  chapter  6.  We  need  to  develop  better  algorithms  for  dealing 
with  very  large  problems.  Equally  as  important  is  the  development  of  fallback 
positions  of  lifetime  characterizations  which  are  accurate  but  possibly  less  precise 
and  less  costly  to  determine. 


7.4.2  Automatic  TNQIND  generation 


One  of  the  long  range  goal;;  for  research  in  this  area  is  to  enable  the  automatic 
production  of  optimizing  compilers.  Some  parts  of  TNBIND  are  independent  of  both 
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the  language  and  the  machine  and  thus  could  be  copied  directly  to  new  environments 
(modulo  the  language  in  which  the  compiler  is  implemented).  In  this  category  are  the 
lifetime  determination  and  the  basic  cost  collecting  functions.  The  actual  cost  values 
depend  on  the  target  machine  and  the  designer’s  criteria  for  optimizatior . 

The  TNP  phase  requires  Knowledge  about  the  language  to  be  compiled  and 
some  way  of  determining  wha*  code  sequences  might  be  used  to  Impla-nent  each 
construct  in  the  language.  Some  work  in  the  latter  area  has  been  dine  by  Newcomer 
[New75],  but  further  work  is  nocessary  to  iron  out  the  differences  in  the  interfaces 
expected  by  tt  e various  phases  of  the  resulting  compiler. 

As  suggested  in  chapter  1,  the  step  from  the  TNBIND  model  to  a program  which 
produces  a running  TNBIND  is  "merely  a matter  of  implementation."  At  the  current 
time,  however,  a good  deal  of  judgement  is  required  to  make  (he  transition. 


Appendix  A 

The  TNP  Routines 


Listed  below  are  the  actual  unadulterated  TNP  routines  from  the  Bliss-il 
compiler.  The  interesting  point  is  not  exactly  what  each  routine  does,  but  that  the 
operations  for  most  node  types  are  highly  similar.  The  routines  become  complex  only 
we  wish  to  exploit  special  features  of  the  hardware  (e.g.  in  the  store  operator)  or  to 
make  clever  implementation  decisions  (e.g.  routine  calis).  The  macros  TLRDEF  and 
NDRDEF  are  used  to  consolidate  the  common  features  of  most  of  the  TNP  routines. 


macro  TLROEF (NRMh, BOOY)- 

routin.  10 (TL)NAME (N00E, TARGET). 

( map  GTVEC  NOOE,  TNHORQ  TARGET)  BOOY)  .N0DEIREGF)>S| 


macro  NOROEF (NRME.BOOY). 

roulina  10  (NO)NAHE  (NOOE, TARGET)* 

( map  GTVEC  NOnE,  TNUORO  TARGET)  BOOY)  novalua)*) 


TLROEF  (NULL,  (ACCESS  (.flYTN) ; N0TE0T0) > [ 


NORCEF (B, ( 

local  TOPI, T0P2, TAR, NTAR; 

T0P1-T0P2-6) 
l«  . NODE  (TPATH) 

thon  IrevorEO  ordor 
begin 

TAR*-.  NOOE  (0PR2) ) NTAR-.  NOOE  (0PR1J ) 

T0P2*-. flYTN)  T0P1-8| 

end 

else  (normal  order 
begin 

TAR*-.  NODE  C0PR11 ) NTAR*-.  NOOE  I0PR2I ) 

TOPI*-. HYTN)  T0P2-B) 

and; 

TNP  (.  NOOE  (0PR1J,.  TOPI)) 

TNP (. NODE  (0PRZ1 , . T0P2) » 

PAEFI10VE  (.  TAR,  .HYTN)  ( 

OPERATE  (. NTAR, .MYTN) , 

>>l 
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NOROEFUJ, < 

TNP  (. NOOE  C0PR1J , ,H  1N)| 
PREFMOVE  (.N00E10or  l],.MYTN)j 
>>l 


« 


NOROEF  (UX,  ( 

TNP  (.NOOE (OPR  11 ,0)  | 

PREFMOVE (.NOOE  (OPR  11 , .flYTN)  | 

)»| 


NDROEF  (PTR,  ( 

TNP  ( . NOOE  (OPR11 , 8)  j 
PREFflOVE  (.NODE  (OPRU  , ,MYTN)  | 
If  . NOOE  IS I2EF1  eql  6 then 
If  .NOOE  1POSF]  8ql  0 then 
ACCESS  (.NOOE  (OPRII ) | 
>>l 


NOROEF  (OOT,  ( 

TNP  (.NOOE  I0PR1) , . M VTN)  | 

If  (.NOOElflOOEl  eql  INOEXEO)  or  (.NODE  IMOOEl  eql  1NOEXEO+OEFERREO)  then 
MflKESRME  (.MYTN, . NOOE  (OPRII ) ; 

>>l 


, 


routine  ISBIT(LEX). 
beg  in 

m«p  LEXEME  LEX; 
bind  CTVEC  NOOE-LEX, 

If  .LEXILTYPF]  eql  GTTYP 

thon  If  .NOOE (NOOEXj  eql  SBITOP 


then  (TNNEEOEO) 


1 


end; 


NOROEF  (REL,  ( 

locel  LOP, ROP; 

bind  B1TLEFT  -ISBITC. NODE I0PR1I  > , 

BITRIGHT.ISB1T  (.NODE  I0PR21  > j 
LOP*-ROP>-0| 

I « B ITLEFT  then  LOP*, MYTN  else 
I « BITR1GHT  thon  R0P*.MYTN| 

TNP  (.NODE  10PR11 , .LOP)  | 

TNP ( .NOOE  C0PR21 , . R0P>  j 

H B ITLEFT  then  (PREFMOVE  (.NOOE  (OPR1) , .MYTN)  | COMPARE  (.MYTN,  .NOOE (0PR2) ) ) else 
If  BITRICHT  then  (PREFMOVE  (.NOOE  10PR2J , .MYTN)  j COMPARE  (.MYTN,  .NODE  (OPR1J  ) ) else 

(COMPARE (.NODE COPR1J , .NOOE 10PR2J > j MOVE <0,. MYTN) )i 

))|  ... 


i i 
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NOROEF  (FPAR,  < 

local  GTVEC  UOP.-PflRIIj 
bind  LEXEME  PARtlLEX=PARN; 

TNP  (PARM*.  NOOE  10PR11 , I f .NOOEICSCOMPL!  leq  HRG1C2  than  .MYTH  8>| 
PREFMOVE  (.N00EI0PR1) , .MYTN) | 

>>l 


NOROEF (LORONOOE, ( 

local  UOP, LEXEME  0P1 j 
TNP  (OPlr.NOOE  EOPR 13 ,8)  | 

If  .OP1 1LTYPFI  neq  BNOVAR 
than  PREFMOVE  < . 0P1 , . MYTN> 
alia  MOVE <.0P1,  .MVTN)  | 

>>l 


NOROEF  (AOOSUB, ( 

local  T0P1.T0P2, TRR.NTRRj 
TOP1*-TOP2*-0| 

If  .NOOE  ITPRTH) 

than  Irevarsa  ordar 
bag  In 

TAR*. NOOE  10PR2J I NTAR*.N00E10PR1)  | 

TOP2<-.HYTN j TOP1*0| 

and 

alia  Inormal  ordar 
bog  In 

TAR...NOOEIOPR11  | NTAR*.NOOEtOPR2I  | 

TOPI*-. MYTH ] T0P2*0| 

and; 

TNP  (.NOOE  10PR1] , . TOPI)  | 

TNP (. NOOE r3PR21,.T0F2>| 

If  (NOT  .NOOEIFCMTFJ ) and  (NOT  .NOOEtRCMOFl) 
thr.n  MALESANE (.MYTN,  .TRR> 
alas  PREFMOVE  (.TOR, ,MYTN>  j 
OPERATE  (.  NTflR,  .MYTN)  | 

>>l 


routine  8INOSTORE <LOP,ROP)» 

begin  map  GTVEC  LOP : ROP j 
bind  LEXEME  LL0P-L0P; 

ROPrBAScTN (. ROP) ; 
if  .ROP  leq  7 than  return  NOVALUE; 
if  .R0PIREQ01  neq  0 then  return  NOVALUE t 
if  .LLOP ILTYPFJ  aql  LITTYP 
than  begin 

ROP  IREQO)  *-MEMNEQOBj 
ROP  ITNLITB1T)»1| 
R0PITNL1TLEX)*.L0P| 

ROP  I8N0TYP) «-0j 
return  NOVALUE 
end) 

ROP  CREQOI  *-MEMREQ0B; 

ROP  CRECFJ  *..L0Pj 
NOTEUSE (.LOP, .ROP  tLONFU) ) | 
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NOVRLUE 

end; 


routine  FINOLEX  (LEX,TNEE>» 
begin 

map  LEXEME  LEX,  G fVEC  TREE; 
bind  LEXEME  TLEX=TREE; 
macro  F INONCSE <N00E> - 

(map  GTVEC  NODE-, 

If  .NOOEIREGF)  leq  7 (Hen  return  0 else 

If  .CT  C.NOOE  IREGF) , BNDTYP)  noq  BNONCSE  then  return  6 elee 

N00E<-.GTI. NOOEIREGF) , OFFSETF)  | 

If  .NODE  ITVPEF)  eql  GRRPHT  then  return  0; 

N00E<LT  )'.°F>*-BN0VflR)  S| 


It  . LEX  ILTVPF)  eql  GTTYP  the.  FINONCSE  (LEX) ; 

It  . TLEX ILEXPRRT)  eql  .LEX  ILEXPRRT)  then  return  1| 

It  . TLEX  ILTYPF)  eql  GTTYP  then 

if  . TREE  INOOEX)  eql  S t'NNULL  then  (FINONCSE  (TREE)  j return  F INOLEX  (.  LEX,  .TREE)) 

else 

Incr  I from  0 to  . TREE INOOCS I2EF ) -1  do 

It  FINOLEX  (.LEX, . TREE  tOPERRNO (. I>) > then  return  lj 

0 

end; 


routine  FINOLEFT <LN,RN)« 
begin  local  X , Y j 

map  GTVEC  LN: RN;  bind  LFXEME  LRN«RN| 

If  ,LRN  ILTYPF)  noq  GTTYP  thon  return  8| 

If  . RN INOOEX)  eql  S00T0P  thon  return  (.RNIDPR1)  eql  . LH»  j 
1 1 NOT  (ONEOF (. RNINOOEX) .BITS (SROOOP , SH INOP , SRNOCP , SDRDP , SSMRBOP ) > 
or  ONEOF  (.RNINOOEX) -2, BITS <SS H I F TDP-2 , SR0T0P-2 , 

SMHXOP-2 , SM INNDP-2 , SEQVGP— 2 ( SXOROP— 2 ) ) ) 

then  return  0| 

If  .RNITPPTH)  then  (X-.  RN  I0PR2)  ; Y-.  RN  I0PR1)  > 
else  (Y*-,  RNI0PR2)  jX«-.RNICPRl)  > j 
If  .RHINOOES I2EF)  eql  2 then 

It  FINOLEX(.LN,.Y)  then  return  0| 

If  <Y-FINOLEFT(.LN,.X)>  neq  0 then  .Y+l  else  0 
end; 


”*  I 
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rout  in*  ISNEGNDT  <LN,RN)« 
begin 

map  CTVEC  LNi RH| 
local  CTVEC  LRNiLLRNj 

b-  LEXEME  LNLEX-LN* RNLEX-RN: LRNLEX-LRNj LLRNLEX-LLRN| 
. .RNLEX  (LTYPF)  neq  CTTYP  then  return  0; 
if  . RN (NOOEX)  noq  SNEGDP  then 

if  . RHtHOOLX)  neq  SHCTOP  then  return  0| 
LRNr.RNlOPRl)  | 

if  .LRNLEX  (LTYPF)  neq  CTTYP  then  return  Jj 
if  .LRNtNOOEX)  neq  SGOTOP  then  return  8| 

LLRN<-.LRN  tOPRl)  | 

if  EQLPOSSIZE  ( . LN , . LLRN)  then  return  1| 

0 

end  | 


routine  S ItiPLESTORE (LN, RN) - 
begin 
! 

! fl  "SIMPLE"  STORE  IS,  BY  DEFINITION,  ONE  WHICH  DOES  NOT 
! NEEO  A SPECIAL  TEMPORARY  FOR  THE  RNS. 

) 

! VALUE  RETURNED; 

! -1  !i  WE  HAVE  A STORE  OF  THE  FO.TM 

! (EXPR1)  ► . (EXPR1)  OP  (EXPR2) , 

! OR  (EXPR1)  ► NOT  . (EXPR1) 

! OR  (EXPR1)  ► - . (EXPR2)  j 

! THE  ’RCMTF’  BIT  OF  THE  ’OP’  (OR  ’NOT'  OR  ’-’)  NODE 

! SHOULO  PE  TURNEO  OFF. 

! 1 ii  WE  HAVE  SOME  OTHER  KINO  OF  SIMPLE  STORE,  E.C. 

I VAR1  . VAR2  ♦ 3; 

I THE  ’RCMTF’  BIT  OF  TNE  ’OP’  NODE  SHOULO  BE  LEFT  AS  IS. 

I 0 n THE  STORE  ARE  OEALINC  WITH  IS  NOT  SIMPLE. 

| 

macro 

AOOORS'JflnONEOF (.RN INOOEX)  ,BIT2 (SAODOP, SMINDP)  )$, 

ANtlORIOR«ONEOF(.,r  'INOOEX)  ,BIT2  (SANDOP.SOROP))*, 

SPECIALCASESe 

NOT  ONEOF (.RN INOOEX) , (B IT3 (SPLUSOP , SXOROP , SEQVOP)  or 
BMSIIX (SGTRDP,6)  or 
BMSKX  <SGTRU0P,6> 

map  CTVEC  LN: RN; 

local  CTVEC  LRN: LLRN,  RRNj 

bind  LEXEME  LNLEX=LNi RNLEXnRNiLRNLEX«LFNiLLRNLEX«LLRN; 
bind  SIMPLEVAL-3; 

routine  SIMPLOP <NDDE)=  (map  G’VEC  NODE; 

if  .NODE  (NOOEX)  leq  MflXOPERATOR  then  1 else 
if  .NOOE  (NOOEX)  eq!  SYNNULL  then  1 eiee 
if  .NOOE (NOOFXl  eq i SSTOROP  then 

SIMPLOP  < i ♦ .NOOE  ITPATHI  then  .NDOE(OPRl)  alee  . NOOE  IOPR2J) ) | 


routino  ISPSOK <LN,RN)« 
bogin 

map  CTVEC  LN i RN; 

bind  LEXEME  LNLEX-LN : RNLEX-RN ( 

local  SSPj 


1 
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macro  ISBISORBICo 

(If  AN00RI0R  thon 

(local  LEXEME  RhN:LRNt 
l(  , RNITPATH1 

thon  (LRN-. RN I0PR2I  ; RRN*-. RN (0PRI1  > 
else  (LRN-.RNI0PR1);  RRN*-.  RN  I0PR21 1 1 
(1  aM  (.RRN IKNOTFI  eqv  (.RNIN00EX1  eql  SANDOP))! 
♦ . LNN  IKNUTFI  *2  ))S; 

macro  ISINCOROEC* 

(l(  fiODORSUD  then 

l(  abs  <EXTEKD(.RN(OFFSETPJ ) ) eql  1 then 
(.RN  IRCAF*  or  .RNIRCSFJ  ))S| 
macro  ISCOMDRNEGb 

(l(  . RN (NDDEX)  eql  SPLUSOP  then 
(.RNIRCCFI  or  . RN IRCNTFI ))S| 


SSP.-.LNLEXISSPF1  i 
If  .SSP  leq  PP816 
thon  TRUE 

else  If  .SSP  leq  PPOB 

thon  (If  IS INCOROEC  then  TRUE 

else  if  1SCDM0RNEG  then  TRUE 
else  ISBISORBIC) 
else  (ISBISORBIC  eql  1) 
enri| 


If  .RNLEX ILTYPFI  neq  GTTYP  then  return  0j 
If  .LNLEX ILTYPFI  eql  GTTYP  thon 
begin  macro  PHOCEEOuex 1 1b  I och$| 

If  .LNINODEXI  eql  SYNPOI  thon 

If  ISPSGi:  ( . LN, . RN)  then  PR0CEED( 
if  NOT  SIMPLOP(.LN)  then  return  0 
erri| 

If  . LNLEX ILTYPFI  eql  BNOVAR  thon 
begin  local  Xj 

If  .LNLEXILEXPPRT;  eql  . PCREG  thon  return  0) 

I LATER  THIS  HILL  BE  EXTENOEO  TO  ALL 
! VOLATILE  LOCATIONS. 

If  NOT  ISPSOK ( . LN, . RN)  then  return  0| 

If  (X<-FINOLEFT(.LN,  .RN) ) neq  0 

then  If  .X  leq  (SIf1PLEVAL+l)/(l+(.LNIf100EI  neq  GENREGII  then  return  -1 

endj 

If  .RNINOOEX)  eql  SPLUSOP  then 
return 

If  .RNIOCCFI  eql  1 

thon  if  . RN CCSCDMPLI  eql  0 then  1 also 
if  ISNEGNOT  (.LN, . RN  I OPR  II ) then  -1| 

If  . RN ( NOOEX I eql  SF3T0RE  thon  return  1| 

If  . RN ( NODE XI  leq  MAXOPERATOR  thon 

if  .RNINOOEXI  eql  S00T0P  thon  roturn  0 else 
If  .RNINOOEXI  eql  SSHRBOP  then  return  1 else 
If  .RNINOOESIZEFI  eql  2 then 
bogin 

macro  PROCEEDoex i lb  I ocXJ| 

If  .RNITPATHI 

then  <LRN-.RNI0PR2It  RRN*-.  RN  (0PR1) ) 
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• Ise  (LRN<-.RNtOPRl)  | RRN*-.RN[0PR2))| 
if  .RNIRECF)  eql  .LRNtRFGF)  then 

if  . RN ICSCOMPL)  gtr  S IhPLEVRL  then  PRDCEEOj 
If  FINOLEX  f.LN, . LRN>  than 

If  SPEC IRLCRSES  Ihon  PROCEED  tilt  return  B| 

If  .LNLEX (SSPFJ  gtr  PF016  than  PRDCEEOj 
If  .LRNLEX  tLTYPFJ  eql  GTTYP  than 
If  . LRN tNOOEXJ  aql  SFSTORE  then  rtlurn  6| 
return  1 - FINOLEX f.LN, ,RRN) 
end; 

I < bog  ;n 

mac.  o TRUNC (X)» (X-MOXCPERRTOR) ij  t 

ONECF  (TRUNC  (.RN  tNOOEXJ  ) ,BIT4  (TRUNC  (SYNIF) , TRUNC  (SYNCRSE) , 

TRUNC (SYNSEL) , TRUNC (S YNLRBEL) ) ) 
end 

then  return  i 

• l*a  If  . RN  tNOOEXJ  lag  MflXQPERPTQR  than 
begin 

LRN^if  ,RN  tTPRTHJ  then  . RN  I0PR2)  else  .RNtOPRlJj 
if  . LRNLEX  tLTYPFJ  nog  GTTYP  then  return  0; 

If  . LRN  tNOOEXJ  eql  SOOTOP  then 
bog  in 

LLRN-.LRNIOPRU  | 

If  . LLRN  eql  .LN  then  roturn  - (SPEC IRLCOSES > 
else  If  EQLPOSSIZE  (.LLRN,  .LN> 
thon  If  SPEC I0LCRSES 

then  (LRNtCOOEOJrlj 
LRN  tflOOE  J «-0j 
LRNtREGFJ-.RNtREGFJ  j 

return  -1); 

end; 

end; 

0 

end; 


rout  ine  TR YSIMPLESTORE  (NODE, LN, RN)  « 
bogin 

h.ap  GTVEC  NODE : LN : RN ; 
bind  LEXEME  LNLEX=LN: RNLEX^RN; 
if  . NOS IMPLESTORE  then  return  8j 
while  1 do 
begin 

macro  00BIN0o( 

case  (S IMPLESTORE  (.LN, . RNJ+l)  of 

set 

Z-U  (BINDSTORE  (.LN, . RNEREGFJ ) j RN IRCtITF } « f « I ■•  > . 
t OX  0| 

7-  17,  If  not  .N0SIMPLEST0RE  then  BINDSTORE  (.LN,  .RNIRECF)  J 
tes 

>*! 

If  . RNLEX tLTYPFJ  neq  GTTYP  then  return  NOVRLUE; 

If  . RN  tNOOEXJ  leq  MUXOPERRTOR  then  return  OOBINOj 
select  . RN  tNOOEXJ  of 
nsBt 
SYNIF: 
begin 

I < OOBINO  then 
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(TRYSlflPLESTORE  (.NODE,  .LN,  .RN10PR31  )| 
TRYSlflPLESTORE  (.NODE,  .LN, .RNI0PR41 > >| 
patupn  NOVRLUE 
and; 


SYNCRSE: 

begin 

H 00B I NO  thon 

Incp  1 iron  2 to  .RN[N0DESlZEF)-2  do 
TRYSlflPLESTORE  (.NOOE,  .LN,  .RNIOPERRND(.  I>) ) | 
peturn  NOVRLUE 
end; 


SYNSELi 

begin 

If  OOOINO  than 

Incp  I f pom  2 to  .RN1NDDE91ZEF) -3  by  2 do 
TRYSlflPLESTORE  (.NOOE,  .LN,  .RNtOPERRNOf.  D) ) | 
petupn  NOVRLUE 
and) 


SYNCOMP: 

CONTINUE  RNf.RN  I0PERRN0 (.RN INODES 1ZEF)  -1)1 1 


SYNLRDELi 

(OOOINO;  PttUPn  NOVRLUE); 


SFSTOREi 

OOB1NQ; 


•iHayit 

paturn  NOVRLUE 

taan| 

and; 

NOVRLUE 

end; 
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NOROEF  (STORE,  < 

local  LEXEME  LOPs ROP , GTVEC  T,  Ti,T2| 

Tl*-T2*.0( 

It  . NODE  1TPRTHJ 

then  (LOP*. NOOE  (0PR2J  | ROP*-.  NOOE  I0PR11  j Ti-.MYTN) 
• It*  (LOP*. NOOE tOPRl)  | ROP*. NOOE  10PR2J  | T2-.MYTNJ) 
TNP  (.NOOE  10PR11 , . Ti > j 
> TNP  (.NOOE  10PR2) , . T2)  | 

It  <T*-.MYTN>  net|  6 than 
begin 

It  . T (REQOI  eg  I flEflREQDB  then 

It  .T1REGFJ  eq I .ROP  tROORFJ  th»n 
T*0( 

end) 

PREFMOVE  (.ROP, .HYTN) | 


It  .T  • q I 0 
then 

begin 

TRYS1MPLESTORE (.NOOE, .LOP, . ROP) | 

PREFMOVE (.ROP, . L0P> | 

end 

•It*  PREFMOVE (.HYTN, .LOP) j 


>>l 


macro  INITPRRIIOESCn  (LNKGO- . LNOME  [LNKGOESCFl  | PflRHNO*0>», 
NEXTPRRMOESCo 
begin 

It  <PRRMNO*.PflRMNO*i>  gtr  .LNKGO  (LNKGSIZEF) 
then  PT*-STRCKPRRM 

e Ise 

<PT*.LNLG0  (PRRMTYPE(.PRRMND)J  | 
PL*.LNKGOIPRRMLOC(.PflRMNO>) ) 

• ndS; 


routine  SPRNPRRMSe 
begin 

local  GTVEC  T,  ITEM  L| 
deer  I Iron  .CRLLSTK (CUROJ  to  D do 
FORRLLTN(T, CRLLSTK  (LSELEM  (.  I>  J , 

L*.T(TNLIFELIST1 | 

It  .L (L1FEST0PJ  lai  .LON  then  ULIFESTOPJ-.LON 


NOVRLUE 

end; 


J 


■ 
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i 

i 


i ’ 


f I 


I 


. 


macro 


INITCALL=INITSTK (CALLSTK) $, 

PUSH  CALL oPUSHSTK (CALLS  TK>$, 

NO  TEC ALL «A00T0T0S (CALLSTK , TNREP  < . SUBHOOE IREGF) >)S, 
POPCALL*POPSTK (CALLSTK  > S, 

RELEASECALLoRELEASESPACE  <GT , . C ALLSTK , STKS I ZE ) S; 


TLROEF  (CALL,  ( 

external  LIFEBLK) 

bind  SWITCHRECISTERn  177570, 

bind  STVEC  LNAME=NOOE tOPRl) ) 

local  GTVEC  SUBNDOEsTNsLNKGO,  DLDN , N , P ARMNO , P T , PL , FSP » 


TNP  (.NODE  (DPR2>  ,B)  | 
INI  TPARflOESC; 


FSP»-1  •, 

If  .N00EIN00ESI2EF)  glr  2 Ihon 
bogin 
PUSNCALL) 

Incr  I from  2 lo  .NODE INODES I2EF) — 1 do 
begin 

NEXTPARMOESC) 

SUBNDOE*-.NDDE  (OPERAND  (. I>) | 
OLDN-.LON+l) 

TNP (.SUDNOOE , 0> ; 

TN-.SUBNDOE IREGF) > 

TN  ITNLIFELISTJ  »-L IFEBLK  (.LON,  .LON)  j 
NOTECRLL ; 

SPANPARflS  <)  | 
case  ,PT  of 
eel 


79.  slack  parm  1 
begin 

if  .FSP  then 

begin  local  0; 

FSP-8)  D*-.DTDSTK  ILO TO)  | 

N*-.0TEI1PS  ECURD1 1 
S A VO TO) 

if  < (.DTEI1PS (CURD)  nog  .0> 

or  (.NDDEINOOESIZEF)  aqi  3) 
or  (.NDDEINOOESIZEF)  eql  5>>  than 
if  TRYSPOYTEHP(.TN,.N)  than 

exi  lease  (TNIREQO)  -tlEHREQOB) 

end) 

if  not  TRYSPDYTEtIP C. TN,N«-.N*1> 
then  DPENOYTEtIP  (.TN, . OLON , 0> ) 

OTOSTK  ILDTD)  «-.N; 

TNIREQO)  ►HEHREQOB 
and) 


_ .A 
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71i  specific  register  7 
begin 

iocel  LEXEME  SUBSUB, TTN; 

SUBSUB«- . SUBNOOE  tOPRll  t 
If  . SUBSUB ILTYPFJ  eql  GTTYP  then 
begin 

TTN«-.GTI. SUBSUB, REGF)  | 

I f .TTN  geq  8 then 
TNITNPERHITK.TTN; 

end  | 

TNSRREQO  <• TN, .PL)  | 

•nd| 


72 ! (literal)  memory  7 
begin 

TN  CTNLIT8IT1  .-TRUE; 

TN  ITNL I TLEX)  «-L  ITLEXEME  < . PL>  j 

TN(REQO) rMEHREGOB 

end; 


73:  (named)  memory  7 
begin 

TNtREGFU.PLi 
TN  (REQ03  «-f1EHREG0B 
•nd 


;«bj 

If  not  ,FSP  then 

KILLPOTEflPSt.  SUBNODE, 

(If  (.OTEW’SlCURDl  eql  .N>  or  (.1  eql  . NOUE (NODES IZEF)  — 1) 
then  .N  cist  .N+l>) 

end; 

POPCRLL 

end; 

NOTEOTO; 

i f not  .FSP  then  P0P0T0 j 

if  (.OTEMPSICUROI  goq  (STKSIZE-.MflXPRRHS))  or  . CUTBRCFSTACK 
then  KILLOYTEfIPS  (.  H00E> ; 


If  . LNflHE ILNKGTFI  neq  INTRRPTLNFGT  then 
begin 

If  .MYTH  Iss  8 then  HYTN«-GETTN() ; 

TNSRREQO  < . NYTN, VREGNUH) ; 

end; 

If  (TN*.hYTN)  noq  0 then 

If  . NODE (X0PR2J  eql  .LXHRLT 
ther, 

bogin 

TN  (REQO)  *-f1ENREQ0B| 

TN  CTNLITB IT3 -TRUE| 

TN IRECF) -SMI TCHREC ISTER; 
end 


else 


\ 


r 


122 


An  Approach  to  Global  Register  Allocation 


TNSRREQ0(.Th,VREGNUI1>| 


• 1 


N OTEUSE < . NODE I0PR2) ,.L0N>t 
NEKTUON j 

N0TECREflTI0N<.TN,,L0N>| 


UPGATELON; 

SETBOUNO; 

>>t 


routine  LOROR0 <N 00E > = 
begin 

map  GTVEC  NCOE; 
local  GTVEC  TX; 

it  (TXr.NDOCtREGF) ) eql  8 then  return  8| 
TX  I8N0TYP)  ►BNOREG; 

TX  tREQOI  *-SRREQ0B$ 

TX  IREGF)  ♦ VREGNUM; 

TX  IBNOLSTHORI  ►REGS  [VREGNUM]  | 

It  ISCLOSEO  < REGS  I VREGNUtt) ) 

then  OPENL 1ST  (REGS  (VSEGNUH) <8, 8>1 1 
.TX 
end; 


TLRDEF (rout ine,  < 

external  LBPRTCH , LBRETLIST, 

local  STVEC  RNRMEiLNRMEiLOESC,  GTVEC  TNtTL; 

LBRETLISU8; 

RNHME*-.  NODE (0PR2)  j 
R NAME  [RETIRE) ►.NODE ; 

LOESCr.  GT  tLNRME*-.RNOHE (LNKGNMF)  , LNKGDESCF)  t 
it  not  . RNYENRB  then 
begin 

deer  1 Iron  5 to  8 do  NULLLST (REGSt.  13 ) | 
incr  1 troll  1 to  . LOESC ILNKGS I ZEFI  do 
it  . LDESC [PRRMTYPE  <■ I >)  eql  REGPRRH 

then  OPENL  1ST  (REGS  t.LOESC  [PRRHLDC  (.  I»  J ) | 

end) 

It  not  EMPTY  (.RNRtlE  IREGrDRMLSTJ  > then 
begin 

local  LSTHDR  L, ITEM  I| 

RNRtlE  [REGFDRMLSTJ  | 
until  (I-.  URL  INK))  eql  .L  do 
bog  in 

TL^GETTNO; 

PREFMOVE (. TL, .GT[. I [R0RT1TEN  <1>) ,REGF1 ) | 

TL  IBNOLSTHDR)  ►REGSt.  I [LORTITEM  (1))  J | 

TL [REQOI ►MEHREQODi 
TL  IREGF > ► . I (LORTITEM  (D) ; 
UPORTELIlROfiTITEMU))  ,X,2> 
end 

endj 
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If  (TN-.NOOEIREGF))  naq  6 than  TN«-L0R0Rfl ( . NOOE) j 
if  .SlflPLELIFE  than  NOOE  IREGF)  «-GETTN  O » 

TNP  < .NOOE  (0PR11 , 0) | 

LBPRTCH  (.LBRETL1ST)  j 
UPOflTELON ; 

PREFflOVfc  (.NOOE  C0PR11  ,N00£ CRECF) TN>  | 

>>l 


NOROEF (flNPOR, ( 

Ic :a I l.OP, R0P.0P1 ; 
loc..l  ITEM  LBLOP : LBROP j 
bind  LEXEME  LEK2-N00E C0PR2)  | 

LOP^ROP^Bj 
If  . NOOE  CTPfiTH) 

then  (R0P-.I1YTN|0P1-.N00E10PR2)) 
else  (LOP-.t1YTN;OPl-.NOOElOPRl) ) i 
TNP  (. NOOE  tOPRH,.I.OP)| 

If  FLOIIRES  then 
bog  in 

LBLOP-NEWBLK  <2) j 
LBLOPILBSUCC  (l)3-.CBSTRRTj 
SRVOTOj 
and; 

TNP  (.NOOE  (0PR2) , .ROP>  | 
if  FLOWRES  than 
begin 

LBROP-NEUBLX(l)j 

LBROP  (LBSUCC  <D)  -LBLOP (LBSUCC  <2 > ) -.CBSTRRTj 
if  . OTEMPS ICURO)  nog  .OTOSTK (LOTO)  then 

(KILLOYTEUPS  (.NOOEIOPR2) ) t SETNOTFPRRtl)  ( 
POPOTO ( 
return 
end  j 

If  not  RESREQ  then  returni 
PREFf10VE(.0Pl,.(1YTN>| 

>>l 


NOROEF (COMP, ( 

incr  I from  0 to  . NOOE (N00ES1ZEF3 -2  do 
TNP  (. NOPE  I0PCRRN0 (.  0)  ,0) ; 

TNP  ( . NOOE  (LRSTOPCRRNO) , . flYTN)  j 

if  RESREQ  then  PREFNOVE  (.  NOOE  ILOS  TOPER  0N01 , . tlYTN)  j 

>>S 


■ 1 r 
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NOROEF (If,  ( 

local  OTUNEVEN; 

I oc*  I ITEM  LBBOOL; LBTHENi LBELSE) 

BISULST  (.NOOc(OPRl) ) | 

TNP  (. NOOE  I0PH2)  ,0)  t 
LBBOOL-NEUBLK  (2)  i 
LBBOOL  (LBSUCC  (1)  J CBSTBRTj 
SnVDTDj 

TNP(.N00EC0PR3J, . MVTN) , 

LBTHEN-NENBLKU); 

LBBOOL  ILBSUCC  (2) ) *-,  CBSTflRT| 

RESETDTOj 

TNP  <.  NOOE  (0PR4I , . MVTNJ  j 
LBELSE«-NEIIULK  (1) ; 

LB  THEN  (LBSUCC  <1))«-LBELSE  (LBSUCC (!)  I -.CBSTRRT; 

DTUNEVCN<-( . OTEHPS  (CURO)  neq  . OTOSTK  tflOTOJ ) j 
M I NO  TO ; 

I f .OTUNEVEN  then 
bogln 

KILLFORKOYTEHPS  (.NODE  I0PR3) ) , 

KILLFORKOYTEHPS  (.NOOE  (0PR4) ) ; 

SETNOTFPRRH 

end; 

POPQTO; 

If  RESREQ  than 

(PREFHOVE  (. NOOE  I0PR3) , . MVTN)  ; PREFHOVE  (.NOOE  (0PR4) , .HYTN) ) | 
BINOLST (.NOOE (OPR5I ) | 

JJ, 


NOROEF (cnee, ( 

local  T, RES, OTUNEVEN, GTVEC  SUBNOOE; 
local  ITEM  LBSELiLBFRK,  FRKX,  GTVEC  LBFRKV; 

DTUNEVEN*0, 

BINDLSTE.  NOOE  (OPR1J ) , 

TNP  ( . NOOE  (OPR2) , 8) ; 

LOSEL*-NEWBI.K  (.NOOE (N00CSIZEFJ-3) ; 

LBFRKV«-GETSPRCE  (GT,  .NOOE  (N00ESIZEFJ-3)  ( 

SRVOTD; 

Incr  I from  2 to  .NOOE  CNOOES IZEFJ -2  do 
bogln 

Sl'BNOOC.-. NOOE  10PERRN0<.  I)  J j 
LBSEL  (LBSUCC  (.  1-1)  )►.  CBETRRT; 

TNP(. SUBNOOE, .HYTN), 

LBFRKV  ( . 1-2,0,36)  -LBFRK-NEIIBLK  (1)  | 

PREFMOVE  (.SUBNOOE, .MVTN) ; 

If  .1  gog  3 then 

If  . OTEfIPS  (CURO)  neq  .OTOSTK IMOTOI  then  0TUNEVEN.-1| 
If  .1  neq  .NOOE INOOES 1 2EF ) -2  then  RESETDTD  else  (1INOTO| 
end) 

Incr  I fro*  1 to  . NOOE  CN00ESI2EF) -3  do 

(LBFRK-.LBFRKVI. 1-1,0,36);  LBFRKtLBSUCC<l)I*.CBSTRRTI| 
RELERSESPRCE (GT, .LBFRKV, .NOOE (NOOES IZEFJ -3> | 

If  .OTUNEVEN  then 
bogln 

Incr  I froe  2 to  . NOOE (NOOCS IZEFJ -2  do 
KILLFORKOYTEHPS  (.NOOC (OPERRNO (. Ill); 

SETNOTFPRRH 


mz-W1 
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end) 

P0P0T0) 

BINOLST  (.NOOE  I0PERRN0  ( - HOOE tNOHES I2EF1  -1>) ) » 


rout Ina  FLO  OP  (NOOE, TARGET , TYPE) - 
bog  In 

I TNP  for  whl le-do,  untl!-do,  do-whlla,  and  do-uni  1 1 
I cases  0 through  3 of  typo  raapactlvaly 
map  GTVEC  NOOE; 
local  L1,L2) 

local  ITEM  LBBOOLiLBBOOV, L00PT0P j 
BINOLST  (.NOOE  I0PR1I ) | 

BINOLST  (.NOOE  10PR2I ) ; 

SB VO  TO; 

ENTLOOP; 

LBBOOLrNEHBLK(l);  LBBOOL ILBSUCC <I) ) *-LOOPVOP»-. CBSTPRT; 

TNP  (.NOOE  I0PR3I ,0)  | 
l<  .TYPE/2 

%0-Um  than  (LBBOOYrNEWBLK (1) ; LBBOOV ILBSUCC  (1>)  ►.CBSTRRT) 
Zg/U-0/,  alia  (LBB00L*-NEHBLn2>  | LBBOOL  ILBSUCC  (1)1  ►.CBSTRRT)  | 
I < (not  .TYPE(-1)) 

or  (bind  LEXEME  0P4-N00E 10PR41  j . 0P4 ILTYPF)  naq  CTTYP) 
then  (RESETOTO;  KILLOYTEMPS (.NODE  I0PR3)  > > | 

TNP(.N00E(0PR4]  ,0); 

It  .TYPE/2 

XO-W/UX  than 

begin 

LBB00LrNEUBLK(2); 

LBBOOL  ILBSUCC  <1 ) J L00PT0P , 

LBBOOL  ILBSUCC  (2>U.CBSTRRT . 
and 

XU/U-0 7.  alia 

begin 

LBBOOY^NEHOLK(l) j 
LBBOOV  ILBSUCC < 1> J - . L00PT0P ; 

LBBOOL  ILBSUCC  <2)1  «-.CBSTRRT 
end; 

RESETOTO;  XITLOOP; 

KILLOYTEMPS  (.NOOE  COPH43 ) | 

KILLOYTEMPS!. NOOE)) 

POPOTO) 

It  .MYTN  naq  0 than  MOVE  (LITLEXEME  (-1) , .MYTN)  ( 
end; 


NOROEF (UO.FLOOP (.NODE, .TARGET, 1) ) ( 


NOROEF  (DW.FLOOP  (.NODE,  .TARGET, 2) ) | 


NOROEF (OU.FLDOP (. NODE,.  TARGET, 3)) | 


NOROEF  (IDLODP,  ( 

local  L,  GTVEC  CV; 

I oca  I ITEM  LOPRE:LBBDDY, 

TNP  (.NODE  I0PR2I  ,8)  | 

TNP  < . NODE  (0 PR 31  ,0)  | 

THP  (.NODE  10PR4) , 0)  | 

BINOLST  (.  NDDE  lOPRi,! ) ) 

BINOLST  < . NODE  IOPRG J ) j 

PREFfIDVE  < . NODE  IDPR21 , . NODE  I0PR1J  > j 

CV*-.N00E  I0PR11  j 

SAVOTO-, 

ENTLOOP j 

LBPRE-NEUBLK  (2) ; LBPRE  ILBSUCC  <1>  J CBSTARTj 
TNP  (.NODE  I0PR7)  ,8) ; 

LBBOOY»-NEUOLK  (2) ; 

LBBOOY  [LBSUCC  <1)1 LBPRE  (LBSUCC  ( 1 > ) j 
LBBODY  [LBSUCC  (2>1^LDPRE  ILBSUCC  <2)1 CBSTORT, 
OPERATE  (.  NODE  I0PR3] , .NODE  10PR11  > j 
CDMI'ARE  (.NODE  I0PR1]  , . NODE I0PR41 ) ( 

RESETOTO j XI TLDDP, 

KILLDYTEMP5  (.NODE  [OPR 7] ) | 

PDPDTDj 

H . KYTN  naq  0 than  MOVE (LITLEXEHE (-1) , .HYTN) j 

> > » 


TLRDEF  ( labe  I , ( 

axtarnal  LBPRTCHj 
TLCOMMON(. NODE, .TARGET)) 

LBPATCH  (.NOOE  (LBPATCHUST) ) t 

KILLPD TEMPS  (.NOOE, .0 TEMPS ICUROJ ) | I 

SETNOTFPARM j 
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1 


> 


TLROEF  < laava, ( 

local  CTVEC  N; 
Ni-.ST(.N00E(0PR2)  ,L INKFLO)  | 

TNP  < . NOOE  tOPRll , .N  CRECP)  > ; 
ttUSTPflTCH  <N  (LBPRTCHLIST)  > j 
PREFMOVE  <.  NOOE  C0PR1J , .N  tRECFJ ) j 
NOTEOTO) 

>>1 


TLROEF fRLERVE,  < 

axtarnal  LBRETLISTj 
local  CTVEC  RNTNj 

RNTN-. NOOE  (OPR2)  | RNTN*-. RNTN IRETLOB1 j 
TNP  <.  NOOE  COPRll  , . RNTN(REGF) ) j 
PREFMOVE  (.  NOOE  COPRll , .RNTN  tREGF) ) ( 
MUSTPRTCK  (LBRETLIST)  j 
NOTEOTO) 

UPORTELON) 

))) 


TLROEF  (SYNNULL,  ( 

local  CTVEC  PRR; 
bind  LEXEME  LEX=NOOEj 
PRR*-.NOOEICSPRRENT) ) 

H not  .PRRIBOUNOl  than 
begin 

If  not  .PRRIOELRYEO)  than  NONBOGUS(PRR)) 
If  .PAR  naq  . LEX(ROORF)  than 

TLL  1ST  (FRSTLEXOUT  (GTTYP , . PAR) , 9) } 

and| 

NOTEOTO) 

UPORTELON; 

ACCESS (. NOOE) ) 

>>1 


] 


NDROEF  (select,  ( 

local  OTHERENO.OTUNEVEN.SRVOTC,  LEXFME  L,  CTVEC  OTHcRTN| 
local  ITEM  LBSEL;LBLEFT:LBRIGHT| 

0 TUNE  VEN*-  0 ; 

OTHERENO-L  UVRLUE  < . NOOE  'LRSTOPERRNOI ) j 
if  .OTHERENO  aql  0 
then  OTHER7N..0 

alia  NOOE COPERRNO <•  NODE CNOOESIZEFJ -2) J «-LEX0UT CTNTYP , OTHER TN^CETTN 0 ) } 
TNP  < . NOOE  CO  PR  11  , 0) ) 

LBSEL-NEHBLK(l)) 

LBLEPT-O) 

MOVE  (LITLEXEME  (0) , . OTHER TN) ) 

Incr  I fro*  1 to  . NOOE CNOOESIZEF) -3  do 
begin 

L*mN0DE  COPERPNO  < .1)1) 
if  .1 

than  ! laf  t part 
begin 

If  .LBLEFT  naq  0 


L 
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than  LBLEFT tLBSUCC  <2> J CBSTfiRT 
else  LBSEL 1LBSUCC <1 ) J ►. CBSTART ; 

If  .L(LTYPF)  noq  SELTYP  then  (TNP(.L,0)  j COMPARE  (.NODE  10PR1)  , .1) ) | 

LBLEFT-NEUBLM2); 

end 

else  Irlqhl  perl 
beg  in 

If  .1  eql  2 then  SflVOTC*-.  OTEtIPS  ICURD)  ; 

SnVDTD; 

LBLEFT (LBSUCC  (1H «-  CBSTfiRT; 

TNPf.L, .MYTH); 

LBRIGHT«-NEUBLK  (1)  j 
LBRICHT  (LBSUCC (1)1  ►.CBSTART) 

If  . OTEHPS  (CURO!  neq  . SflVOTC  then  0TUNEVEM*-1  j 
RESETOTO; 

KILLOYTEMPS  f.L)  j 

POPDTO ; 

end) 

If  .1  leq  .OTHERENO  the.'  ACCESS <. OTHERTN) 
and; 

LBLEFT  (LBSUCC  (2  > J . CBSTART ; 

If  .DTUNEVEN 

then  SETNOTFPRRM 

else 

Incr  I from  2 to  . NODE  INODES J2EF1 -3  by  2 do 
begin 

local  LEXEME  OP; 
bind  CTVEC  SUBN00E=0P; 

OP-.NOOEIOPERflNOf.I)}; 

If  . OP (LTYPF J eql  GTTYP 

then  SUBN00E10TDELETE) ►OTDDNTCARE 

end; 

> > 5 


TLROEF  (ENABLE,  ( 

SETBOUMO; 

LOfiORD  (. NODE) ; 
NDTEOTO; 

UPOATELON, 
TNP(.NOOEtQPRl)  ,0) ; 
>>l 
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NOROEF  (SIGNAL,  ( 

extsrnnl  LBPATCH,LBRETLIST| 
Iocs  I GTVEC  TL5 
TNP  (.NOOE  I0PR11 ,8>  j 
L0ADR8 (.NODE) | 

MUSTPATCH  (LBRETLIST) , 
PREFflOVE  (.NODE  I0PR1) , .flVTN)) 
>>» 


Appendix  B 

The  Basic  Packing  Routine 


Listed  at  the  end  of  this  appendix  is  the  code  for  the  basic  loop  of  the  packing 
procedure.  The  routine  EVAL  is  called  recursively  to  evaluate  the  maximum  profit 
that  can  be  obtained  by  packing  subsequent  TNs.  Below  are  descriptions  of  the 
important  routines  called  by  EVAL. 

POSTUPDATE  remembers  the  current  complete  solution  as  the  most  profitable 

thus  far  examined.  When  the  algorithm  terminates,  the  last 
solution  remembered  is  an  optimal  solution. 

REALIZABLE  tries  to  pack  its  second  argument  into  a register.  The  first 

argument  is  true  when  REALIZABLE  is  called  during  reshuffling. 
The  value  returned  is  true  if  the  packing  succeeded  and  false 
otherwise. 

BESTCASEWITHOUT  returns  the  value  of  the  most  profitable  completion  of  the 
current  partial  solution  without  the  current  TN.  The  most 
profitable  completion  may  not  be  feasible. 

UNDESIRE  removes  its  argument  from  the  current  solution. 
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rout  Ino  tVAL  (PASTVAL,ME>» 
bog  In 

map  TNREPR  HE ; 
labs  I HOVER ICHT j 

macro  BOTTOHOFTREE  (X)*(.X(RLINK1  «ql  LHEA0<M»*l 

macro  TR YUP0flT£=  ( 1 1 -.HYVRL  eql  .GLBLMAX  then  PGSTUPORTEO)S| 

local  TEMP,  MINACCEPT,MAXSON,PREFMISSEO,  TNREPR  S0N| 

bind  MYVALaPASTVAL; 


EVALCNK.EVALCNT+l; 

ME  (TNFLO  (TNREGSTRIEO)  J *-.RESERVE0| 

H NOT  REALIZABLE  < ta Is*, .ME  (TNPTR) > than  return  8| 

H . SIHPLERESHUFFLE 
than  PREFMISSEOt-0 

else  PREFMISSED*-  (.ME tTNFLO (PREFF>)  nag  8)  and  <.ME(TNFLO(BNDTYP)J  naq  BNOPREF) | 
ME (TNFLO  (TNCONFNUM) ) wNOCONFSEEN; 

If  (MYVALr. PASTVAL  + .ME  tTNFLO  (TNCOSTM-.PREFHISSED)  gtr  .CLBLMRX 
than  (CLBLMAX. MYVAL|  MYVAL— . HYVAL>  | 

HINACCEPT*. GLBLMAX -aba <.MYVAL>| 

MAXSON-Oi 

If  BOTTOHOFTREE  (HE) 

then  <TRYUPOATE|  UN0ES1RE  (.ME  (TNPTR1)  | OORESHUFrl) 

a lie 

do  bogm 
SON-. ME; 

while  (SOh  •- . SON  (RL1NK) ) naq  LHEAD<8,0>  do 
MOVER IGHT  > 
begin 

If  .EVALCNT  gtr  .MAXREPS  then  (CUTOFF-truei  axltloop)| 

If  .SON (TNFLO (BNOLSTHOR)J  naq  0 then  laava  MOVER IGHT | 

If  .SON (TNFLO (TNCSUH''  laq  . H INACCEPT  than  exltloopi 
If  < TEH  P*-E  VAL  (abs  (.MYVAL) , . SON)  > gtr  .MAXSON  than 

(If  (MAXSON-.TEMP)  gtr  .MINACCEPT  then  MINACCEPU. TEMP)  | 
if  BESTCASEWI  THOUTt.  SON  ( TNPTR  J ) laq  . SOHtTNFLO (TNC0ST3)  than  exltloop; 
end; 

TRYUPOATEi 

UNOESIRE  (.ME (TNPTRJ ) , 

If  .TEMP  oql  .SON (TNFLO (TNCSUH) J than 

if  . SON (LL INK)  aql  .ME  then  ! optimal  completion,  don't  try  any  awre 
ex  1 1 loop; 

If  . SIHPLERESHUFFLE  thon  exit  loop; 
end 

t until  NOT  REALIZABLE  (falsa, .ME (TNPTR) ) | 

re  turn  (.ME  (TNFLO  (TNCOST))+.MAXSON-.PREFMISSEO> 
end| 


-V 


- - — 
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